Watch videos with subtitles in your language, upload your videos, create your own subtitles! Click here to learn more on "how to Dotsub"

Getting the Most out of Windows Azure Storage

0 (0 Likes / 0 Dislikes)
  • Embed Video

  • Embed normal player Copy to Clipboard
  • Embed a smaller player Copy to Clipboard
  • Advanced Embedding Options
  • Embed Video With Transcription

  • Embed with transcription beside video Copy to Clipboard
  • Embed with transcription below video Copy to Clipboard
  • Embed transcript

  • Embed transcript in:
    Copy to Clipboard
  • Invite a user to Dotsub
[Getting the most out of Windows Azure Storage—Joe Giardino] All right, well welcome. Congratulations getting all the way to this building and this room at 8:30 in the morning. I'm very impressed. It took me a while. So today's session is all about best practice, throughput, performance, squeezing the most out of Azure Storage as you possibly can. We're going to cover client best practices, servers best practices. We're going to talk about some of the work we've been doing on the back end and some of the work we've been doing on the client SDKs, show you some cool features, and then we're going to show a lot of code, a lot of demos just kind of in-real-life type stuff, and then also cover some future directions and there'll be Q&A at the end. So I skipped a slide; my name is Joe Giardino. I'm a Senior Dev Lead on Azure Storage. I work specifically on the client libraries and the SDK piece. I do a lot of testing and scenario evaluations. So this, if you're new to Azure, this is what we look like. This map is always changing as they're doing new build-outs, new regions. I think they just recently announced a beta in China. They're always updating where we're deployed. Each region on here is actually 2 data centers. There's a primary and a secondary that's several 100 miles away so that we can asynchronously replicate data in case of a disaster. And generally around all of these data centers is a series of CDN endpoints to facilitate high performance, massive content delivery, things like that. So just the real quick intro. Azure Storage is blobs, tables, queues, and disks. Disks are just durable NTFS volumes backed by the same storage for blobs that you can use to host VMs. Blobs are essentially the file system in the cloud, and tables are the massively scalable NoSQL storage. So schema-less data store. And then queues provide basically guaranteed message delivery. We expose all this over REST. We know that everyone loves writing REST, so we also have client libraries, and we've tried to include some of the best practices and that'll allow you to get up and running pretty quickly. [Scale] All right, so let's jump right into it—scale. So these are the scalability targets for a given storage account. These are any account created after June 7th 2012. You can check this in the portal. The capacity's up to 200 terabytes. You can have 20,000 transactions per second. And then the bandwidth targets for Geo Redundant and Locally Redundant storage are down there. You can see that they're different; it's 5 and 10 for Geo Redundant and 10 and 15 for Locally Redundant. So you can actually disable the geo replication of your data to the secondary data center if you like. I think it's like 27% cheaper that way. But that's your choice. The only note about that is geo replication, that cost is built in. If you do turn it off and then back on, there is a cost to re-bootstrap your account. So try to decide "do I need geo replication or not?" and kind of stick with that decision. So the next level is the partition level. So again, same thing for the accounts after June 7th. A Single Queue can provide 2000 messages a second. A Single Table Partition, which essentially means all the entities with the same partition key, is 2000 entities per second. And then a Single Blob is 60 megabytes per second. Again these are targets because this is— the behavior of the back end is to automatically try to load balance to meet these targets, so you may get some performance above this and then it'll scale out depending on the multi-tenant— what's going on on the server, essentially. So there's a great blog post you can read about some of the new hardware and new networking stuff that's been going on our data centers. Essentially they move from 1 gigabit to 10 gigabit. They flatten the network rather than having multiple-level hierarchical design. And so all of this stuff all up is what's enabling some of these high bandwidth things like high performance computing, Map Reduce, Hadoop, IaaS disks, things like that. So if you want to know more about what's going on at the data center you can read that. And so this is a key concept we're going to discuss today. What happens on the server? How do we autoscale so that we can scale up to meet your needs, and we can hit truly web-scale workloads. So what you have here is just a very high-level view of what a storage cluster looks like. At the bottom there you can see we have a DFS layer. This is the durable file system. It's responsible for replication and for data consistency, things like that. On top of that sits the partitioning layer. And this is responsible for essentially the load balancing, right? And so as data becomes hot, it can actually split that out and be hosted by more and more servers. When that happens a key thing to note is that no data is actually physically being moved around. It's just that different servers are now serving different data. On top of that is our front end. That's responsible for authentication and forwarding the message to the right server that's serving that data. And then you have the virtual IP or the load balancer at the top. So let's take a quick example. I have my account here, several partition servers serving my data. And I get a request that comes in, request 1. The FE has a cached map of where these partitions are being served by which server. So it says "okay, this is on partition server 3" and it routes it accordingly. But I get another request, request 2. It comes in another FE. And now the partition is actually above its target, above the scalability limits, and so we detect that this is hot. And so the table master—the partition master here detects this, and it will actually unassign that range partition and reassign it to a secondary server. So now as these requests come in, request 2 gets forwarded to that secondary partition server, and the load is now load balanced across those servers. And this is always happening in real time to try to meet these scalability targets in a multi-tenant system. And so this is going to be really fundamental to a lot of the things we're going to touch on later in the paper here. If you want to know more about the durable file system and all the really cool stuff inside there, there's a great paper that was part of the ACM SOSP, a symposium. And you can read all about CAP theorem and a bunch of these different internal things. The 1 point I want to highlight here is that we aren't shuffling bytes back and forth between different disks. We're only changing the servers that are hosting it. So in real life who is using it? Xbox is serving huge workloads using blobs, tables, queues. They're using Cloud Game Save. Halo 4 is using it, music, Kinect data. SkyDrive is using blobs to store pictures and documents. Bing actually has their social integration feature to index Twitter and Facebook, is using blobs, tables, and queues. And then Skype is also using blobs, tables, and queues for video messaging. So this is really, really working well for some very large clients internally. All right, so performance, this is the meat of what we're going to get at. Quick notes: all of our demos are going to be using the upcoming 2.1 release of our storage client. So we've got a lot of performance improvements and also some cool features. I have a live demo deployed in a production cluster with a new account that's past that June 2012 date. And then I have compute co-located within the VM instance. And then I'm running on a server 2012 IaaS VM. So from the client perspective. So those of you who have been developing storage may have used the .NET SDK that we've released. Recently in the fall last year we released a big 2.0 update, and it was all about expanding platform reach and a bunch of performance improvements. And so we had an internal client that moved from 1.7 to 2.0. They're doing heavy table workloads and basically, based on their performance numbers, they were able to reduce the size of their VM deployment by almost 3x. And they were saying VM to VM, CPU was cut mostly in half. And we're going to take that work and build on it in 2.1. So this all about performance, fundamentals. We have a bunch of features that people have been asking as well. Last I checked, the code coverage was over 90%. We've got 900 public test cases. So you can sign up to get HUB and download these and run them or even steal the code for your own scenario. Performance, we've got an automated performance suite so we can do A/B testing for different features, for different scenarios. So we can make sure that—how are we behaving on the GC, for example, in this different scenario? And so this has kind of helped us dial in the performance for the different workloads. And then stress, we actually are running all of our SDKs through a stress framework internally. So we pass terabytes and terabytes of data through these things, every single API, validate data consistency, things like that and see what happens when you push it to the limit when there's no more threads available in the thread pool and you have low memory and things like that. So all this will be available soon. This is the NuGet place where you can get the current 2.0 release and then the RC will be coming out in the next couple weeks. So what's new? A big feature people have been asking for is Async Task, so we have that with full support for cancellation. Byte Array, Text, and File Upload—this is one of the most requested features that used to be there in 1.7, we took it out for 2.0 to minimize the API scope, and then people really wanted it back so it's there including some range methods for the byte arrays. So there's no more pre-buffering of data, which is very nice for performance. We have an Iqueryable provider for the new table layer. This is really helpful. People that are migrating from 1.7 to 2.1 now will be able to take those queries, more or less, and port them. But we've done a lot of really Azure-specific optimizations. So continuations and things like that are much, much faster. On top of that, we've actually optimized this for a NoSQL, schema-less world. So Iqueryable is very much—likes to be fixed to a given type. It likes to know that this table has these entities of this type. And so we've done some optimizations to let it work with multiple types in different places. We'll look at some code at that later. And then buffer pooling, we've actually exposed an interface to allow you to hook up a buffer pool. And it looks remarkably like the buffer pool and System.ServiceModel.Channels. And so you can basically hook up to the WCF buffer pool. If you're running a live service at scale, this is really helpful especially with large objects, anything over 85,000 bytes. You're not constantly allocating and GCing these things. We have a new multi-buffer memory stream. Part of our performance analysis showed us that the .NET memory stream has a couple of quirks that we wanted to work around. And so we have this consistently performant memory stream internally in the case we have to buffer any data. We've moved the .NET MD5 to be the default. So in 2.0 we supported a FISMA-compliant native implementation for people that needed that, which is great. However, we saw that there was a roughly 20% perf hit in doing that. So we've actually reverted to the .NET MD5 implementation by default, which still was what was there in 1.7, et cetera. But if you want to you can still get the FISMA-compliant one by setting a flag. And then we have compiled expressions for tables. This is really, really cool stuff where basically you get the flexibility of run-time reflection without the performance hit. And so we'll look at some of that stuff as well. All right, so what can you do as a client? Follow best practice for each service. We're going to talk on some high-level points of each service, what you can do, access patterns, optimizations, things like that. And then identify what you're trying to optimize for. Is it latency? Is it throughput? Is it cost? And then have a plan how you're going to provide yourself with the knobs to control the different aspects of application. And then you want to identify the scalability targets and your access patterns. So specifically for tables, for example, right? How am I going to query my data? That really matters in how I store my data and how I shape my data with the partition keys and row keys. I also need to have a plan— what happens when my service doubles or triples, right? If I'm at 2000 entities a second today, and I don't have a plan for scalability, and I get 10,000 tomorrow, what happens to my application? In the client we provide timeouts, both server and client and retry policies. So you can have much more control over latency. So if you're driving a UI or a website or a service, you can have a maximum execution time that includes any potential retries or backoffs or multiple attempts, things like that. And then you can fine tune the retry depending on what type of exceptions you want to retry on and things like that. Use the latest NuGet package. We are releasing to NuGet much faster, roughly every few weeks to a month, than the traditional SDK releases, which normally take a few months. This is always tied to latest service release, has the latest performance fix—well improvements, and any sort of bug fixes. There's a huge change log; you can get all the source at GitHub, which is always released at the same time. All right, so some general .NET client-specific best practices. ServicePointManager has a bunch big, awesome flags that you'll want to look at, but you've got to be careful to set these as like the very first thing in your application for the AppConfig because it caches the service points. Meaning that if you set them after you make a request to a given endpoint, it's not going to affect the behavior. So Nagle is an algorithm for small requests where it tries to wait for multiple requests to fill the TCP window. If you have a small request, you want to disable this because it'll affect the latency, so queue messages specifically. Disable expect 100-continue. .NET 4 is actually—you can install .NET 4.5 on top of it, still target .NET 4, but take advantage of the improved .NET 4.5 GC. There's a great blog post there about the background GC and the performance. There's a lot of amazing optimizations there. So I highly recommend it. There's a couple inconsistencies with the URI escaping class, things like that that we're still working through. So we're still recommending targeting .NET 4. Avoid the large object heap especially if you're running a large service or some sort of middle tier, like WCF service, things like that. As you allocate these bigger objects, they're more expensive to clean up, little more expensive on memory. And then re-use any buffers if possible. Again the less amount of time you spend in GC, the more time for your process to do meaningful work. For latency sensitive scenarios, there's a new feature in the GC and .NET 4.5 called SustainedLowLatency. This is something you can enable to do more frequent, small GCs so that you won't have any large pause at any given time. And so you can have a more guaranteed, more predictable environment so you don't have a 30 millisecond GC pause or 10 millisecond even. It will use more memory, but you have a more consistent performance out of that. As we've been evaluating some of these scenarios, we've also identified a few performance gotchas. So we figured we'd throw those in there as well. Again, ServicePointManager must be the first thing in your application. DefaultConnectionLimit—the .NET DefaultConnectionLimit is 2. So we have a bunch of people that come and say "I've got 80 workers. "I've got an XLVM. I've got a gigabit nic. I'm getting very low throughput." And one of the first questions is what's your DefaultConnectionLimit? And a lot of times if you don't set it, it's 2. So even though you've got 80 concurrent things happening, 78 of them are waiting for a connection. All right, so simply setting this higher, we've seen people just improve their performance dramatically. Again, do this right at the top of your application. And then the .NET memory stream, we kind of touched on this. Async performance isn't as good as it could be. They actually have a comment in the code specifically to avoid the Read and WriteByte methods because they allocate a 1-byte array. And then the big one is the dynamic buffer size behavior. So there's I think 7 different constructors of memory stream if you're using an application. You can give it a fixed size or an underlying byte array buffer, and that'll be fine because it knows that it's a fixed size. But if you don't give it a fixed size, if you just allocate a memory stream and write to it, it goes through this dynamic mode. What that means is they first allocate 256-byte buffer. When you write 257 bytes, it allocates a 512-byte buffer and copies the data. When you write 513, it does 1024 and copies the data. What this means is for a meaningful size buffer, especially let's say 64 megs, you've actually allocated and moved data 19 times, 10 of those are on the large object heap which we've already said we want to avoid. So again this is part of the reason for our internal implementation. If you want to look at the source you can go grab it on GitHub when it's available. But just something to know for your own application because you'll spend more time creating buffers, moving data. That's not actually doing meaningful work for your application. And then unbounded TPL parallelism— people as they move to the Task Parallel Library find a lot of really cool things, and then they don't provide themselves with the knobs to control total application layer concurrency. And so they'll do a Parallel.ForEach on all these blobs, initially like 1000 requests at the same time, they'll get throttled, they'll have issues with very peaky throughput. One of the best practices is you always want to be able to control the concurrency in your application. And so there's a blog post there where it actually shows you how to create a task scheduler that does just that. So that you can schedule tasks and know that you have N of them running at any given point. All right, so the server best practices. You want to locate storage accounts close to users. If you're running compute in the DC, obviously you want to put the compute with the storage. The intra-DC bandwidth is incredibly fast, but it's also free, which is a good cost optimization. If you have a wide spread client, if you're serving mobile clients, things like that, you want to consider possibly using more than 1 storage account depending on what your scenario is. So you could have 1 in North America, you could have 1 in Europe, things like that. You need to understand the scalability targets. Again, they're different for Geo Redundant and Locally Redundant storage. So understanding what the ingress and egress targets are will help you tune your application. And then enable logging and metrics. So we'll show you how to do this on the portal. This is incredibly helpful when you're trying to tune an application specifically if you're using features like shared access or public where you aren't making the requests to storage yourself. You can actually go through and audit what's my throughput? How many requests am I getting? What sort of errors? What sort of latency? Things like that. And then you can actually hook that up with the client API as well. [silence] Optimize what you send and receive. For each service, there's a bunch of different best practices you can do to actually send and receive less data when you actually need it. So for example in a blob, you could read a sub-range of the blob or you could mark the metadata to indicate some sort of state, right? This has been processed, this has not been processed, things like that. So now you can receive this data with a simple head request, right? Without having to download the whole blob and possibly look at its contents. If you do the head request you always get things like content and length. Things like that can be really useful for content management systems. For tables there's upsert. So if you don't care what RD gives us on the server, you can do an insert or merge or insert or replace operation. And that saves you the round trip of having to say does this entity exist? If it does, I want to modify it. You can just put something on top, no matter what, and if it's not there, it'll just do an insert. So that saves you an entire round trip. The bonus there as well as in insert operation actually sends back the XML payload of the data to the client. So if you use an upsert operation, you don't get that. You actually have less bytes to download, less bytes to parse, lower CPU. And then projection. So when you're doing your queries, if I have a customer entity with 200 properties but I only need his email address, I can do a simple server projection and say I only need his email address and get that 1 column down. Less bytes to send, less bytes to process. And then queues. You can use update message semantics, you can checkpoint work, rather than using more messages, right? So I—for my first phase of work I could take the queue message out, I could delete it, I could insert a new message over here. If I simply update the same message, there are fewer requests to the storage service. Then you can tune the batch size as well. We already talked about the auto-scaling behavior. The only way that that can work is if you target your data at multiple partitions. So a blob is a single partition. But a table, for example, all the entities with the same partition key is now considered a partition. If you put all of your data with the same partition key, we've seen this done, there's no way the server can split that out in auto-scale. So the maximum target you're going to get is 2000 entities a second. One thing to note about that is that the entity group transaction feature batch will allow you to do atomic updates with multiple entities as long as their partition key is the same. So that's one of those things when you're considering how you shape your data to take into account. Control parallelism at the application layer. Again, we covered that. And then blend traffic types. So a lot of people when they're scoping out— you know, I'm going to have blob worker here, my table worker here, and my queue worker here. This is going to handle this type of traffic. Generally speaking blob workloads can be very I/O intensive. And table workloads can be very CPU intensive. By co-locating those in the same role or in the same VM you can make use of the resources you're paying for because now you have something that's using a lot of CPU and something that's using a lot of network. All right, so let's get into the services. Blobs. So a blob is a single partition. The scalability target is 30 MB/s. Something key is choosing the right BlockSize. There's a couple of client API things there to kind of control that. We're going to do a couple of experiments to kind of see how we tune that. If you're using the client SDK, use a seekable stream when possible. The reason why is the BlockSize actually has some determination in the maximum blob length. And so by default the client will use a 4 MB block size. If you're using say a 10 kB blob, we actually will have to use the PutBlock and PutBlock list semantic internally because we don't know the length ahead of time. And what this means is you're actually going to have do 2 requests when you could have done a single PutBlock blob and the latency is almost double. So if you can, use a seekable stream. And we'll do a quick experiment to look at that as well. Use async for scale. If you're trying to get not just hundreds but thousands and thousands of requests per second, use async. This is much easier with the 2.1 task and async/await. If you're on .NET 4 you can actually install the async targeting pack and use async/await on .NET 4 without moving to .NET 4.5. And then the 2.0 clients actually have a much more efficient download resume feature essentially. And so as a failure happens on a large download they can simply say how much data did I get already? And start the rest of the request later. So if you can, for all these scenarios moving from a 1.x client to 2.0 client has dramatically improved performance. So here are some of the improvements in the 2.1 release. Again, the new APIs. We have an asynchronous stream open. There used to be a synchronous call in there, so we fixed that. The .NET MD5 implementation, there's the flag right there if you need to set that back. The multi-buffer memory stream and then the IBufferManager again pointing you to the System.ServiceModel because that works great. You can hook it up in just a few lines of code. So here's just our first set of performance experiments. So we run these against production tenants. This specific test is a single XL VM, 50 concurrent workers, and they're just uploading 256 MB blobs. So we have 1.7, 2.0.51, 2.1RC. And so we've driven down latency in some cases by 16% for upload 23-37% for download, and the CPU has dropped by almost 40% in some cases. So, again, we highly recommend using the latest thing. One of the cool features for async specifically when you're using network streams, which a lot of people are doing in middle tier services, is we have a new overlapped asynchronous copy. So traditionally you would have read and then written and read and written. We can actually do 2 of those things at the same time which helps with the latency here. All right, so throughput. A single XL VM can achieve a gigabit or a little above to the blob store. But a single partition, it's target is 60 MB. So what this tells us is we want to access multiple blobs concurrently. And generally we want to use the minimal blob level parallelism for a couple different reasons. One, again, this distributes the load over many partitions and allows the server to scale. Two, as we have these longer connections the TCP window grows so we make more efficient use of our connections. Also anything under 64 MB can be uploaded as a single blob put. So not you're not having to make many, many blob block requests and a pug block list. And then also there's a "long tail" problem when people are doing concurrency at the blob level. So for example if I have an 11 MB blob, and I have my parallelism set to 10, and 1 MB blocks. All things being equal, the first 10 blocks complete more or less at the same time. But then I have to wait for a single operation, right? I no longer have concurrency at that point. And so what you end up when you do high blob level concurrency as compared to multiple blobs at the same time is somewhat peaky throughput, right? And we want to try to overlap as much of that to get more sustained throughput the entire time. So we're going to do a quick experiment. What pattern is faster to upload N blobs? I can do 1 concurrent upload with parallelism set to 30 or I can do 30 concurrent uploads with parallelism set to 1. So I already kind of gave you the answer. But let's look at the data. So we have an XL VM uploading 512 256 MB blobs with a 1 MB block size. This is roughly 128 GB. So our first test there, parallelism set to 30 with 1 blob. We averaged 50.72 MB, not bad. However, if we do multiple blobs, now we're close to 100 MB/s, and the server again can continue to scale. So this is almost twice as fast. The single blob also is always going to be bound by that single partition on a single server. So you may peak above this 50 MB/s, but you're never going to hit the full GB/s. All right, so let's parallel upload. And the next thing people always ask me is "where's my parallel download?" And we have a really good answer, it's not there. And it's not there on purpose. Here's why. It would require a much larger memory footprint. So if you did all these range requests we'd actually have to put them in memory and then shuffle them and make sure we write to your stream in the correct order. If the first range took some larger amount of time, we'd be blocked before we could write data, things like that. And then in our experimentation, it actually didn't provide any significant performance improvement. So let's look at this. We're going to download a GB blob on a single XL VM. So I'm going to switch over to my VM here. So this is live against production. I've got my performance manager, and I've just got a tool. I've already uploaded 128 GB—or, sorry, a 1 GB blob. And all this guy does is he's going to download this blob. So if you look at this, right now I'm at 1.5 Gb/s. Which is not bad, this is over a single connection. And again we're down to 16% CPU here, which is also not bad. So again there's not much performance benefit in doing this parallelism on the download. And so we highly recommend just doing the single get when possible. Again you get the fault tolerance, and in the case of an error we can just continue at where we got the last data. Content V5 is there to make sure that through these different faults that the data hasn't been corrupted. So you can make sure that you've got your data back correctly. So this was just a quick thing to point out. We were downloading a GB in just under 8 seconds. This was actually 130 MB. One quick note about this, this is more than double the scalability target for a single blob. We are a multi-tenant system. And so any time you are achieving performance above the scalability targets, which is possible, you need to know that the load balancer is a live operation. So you can't take a dependency on anything higher than these targets because tomorrow as things get load you may be co-located with other data. And you'll get back down to those targets of say 60 MB/s. All right, non-seekable streams. So here's an experiment: upload 10,000 10 KB blobs. We actually had an internal customer that was doing this. They were just proxying data. They had a middle tier. Everything that they got from a network stream they'd just throw into the client, persist it into the storage. The non-seekable part though was forcing this PutBlock list, and they were seeing high latencies. So this is the exact underlying stream implementation. We just overwrote it to set the flag to true or false for CanSeek. So there's my single Put Blob, which is a seekable stream here. And then there's my non-seekable stream. Again, for small blobs we highly recommend, if you can, to use a seekable implementation. The other thing to note when you're using the blob stream implementation the non-seekable stream, all the data has to be pre-buffered so it can be dispatched potentially for parallelism. So you're actually having to copy data into a buffer and then copy it again to the network. Whereas if you're doing a Put Blob, you're reading the data directly from the source, writing it directly to the destination. All right, so let's look at the Block Read size. So this is a 256 MB block blob with 4 MB blocks. Now we can read it in a variety of different ways. So the first one here is a single get. In this experiment, we're at about 85 MB/s. As we move up to 4 MB blocks, we're now down to about 27 MB/s You can see 2 MB, 1 MB, 512 KB, 256, 128, all the way down to 64 KB reads, I'm down to 8.69 MB. And the reason why is because I'm doing multiple, very tiny requests. I have to authenticate every single request. There's some latency potentially to establish the connection. So if you're going for throughput, using the longer existing connection and reading more data is a huge win. Something else to note here. When you're choosing your block size, the BlockBlob can contain 50,000 blocks. And so this actually limits the upper end of how much a blob can hold. So for example if I use a 4 MB block size, my blob can be 195.31 GB. If I use a 2 MB, I'm down to 97. If I use 128 KB block size, I'm at 6.1 GB. It's different for PageBlob. PageBlob can be a TB. But when you're choosing the upload block size, this is something to consider. All right, so let's look at tables. So what's new in the 2.1 release? We have an efficient IQueryable with a bunch of really cool tricks and shortcuts for some NoSQL-type scenarios. Compiled serializers, again, the benefit of run-time reflection so you can use client types and other types that don't have access to at compile time. But with 40-50% less CPU utilization depending on how complex the type is. We also have some more flexible serialization options. So we have an Ignore Attribute so if you don't want to persist that 1 property you can just stick that on there. And then we've also exposed the default serialization logic as a couple static methods on the table in a new type. And so now you can persist 3rd party objects without having to have them derive from table entity or implement table entity. So we've got a really cool entity adapter example that we'll show later where I can set up a service with a DTO object that's serving to my clients and they'll never have to know about Azure storage or what's on the back end. And then I can persist those objects in place without copying them to a different type to Azure storage and read them back as well. And again the multi-buffer memory stream and buffer pooling. So let's look at some improvements from 1.7. This is just the complex single entity tests. We have an entity with 30 some odd properties of all the different supported types, and we're going to do upload, download, deletes. So we have 1.7, 2.0, and then the 2.1 as well. So we've improved latencies in some cases 17-29%. CPU is down by close to 40%. Again, this is a single operation. We have similar improvements for entity group transaction or batch. So again highly recommend using the later client. Best practice for table. Key selection is the number one, most important thing you will do when deciding to use table storage. So, again, figuring out how you're going to access your data, how you're going to store it, what queries are going to do. And then, again, thinking ahead for scalability. If everything has the same partition key, the server can't load balance. If you know how you're going to access the data, you want to store it in a way that you're hitting multiple partitions relatively evenly as possible so that that auto-scaling can get the most for you. Again, efficient queries. We'll look at some different examples of how to query and how they rank in performance. Send and receive the smallest amount of data. So we'll do a couple experiments about the merge/upsert. And then server side projection, this is really, really helpful especially on client devices that have slower, spotty internet, right? Taking a small amount of data for my gamer score for my application, for example, right? Much, much faster than having to download the entire entity, which could be up to 1 MB. And then there's a new table service layer in the 2.0 clients. We prefer this over the WCF implementation simply because it's faster, more extensible, and there's a lot more performance implements coming there in the future. All right, so let's look at the query design and the selectivity. The selectivity or query refers to essentially how many data items do I have to iterate and how many are coming back to me, right? And so you want to limit that as much as possible to drive latencies down. The key thing to remember is, again, you're possible being served by multiple partition servers. At every server boundary you'll get what's called a continuation token, which essentially is returned to you saying "I've gotten all the data off this server, if you need to check the next server, here's the token to go back to go back to that server." And the execute methods in the client will handle all this for you. You'll just notice all of a sudden every now and then there will be a little latency blip as it makes a 2nd request. They can also be returned when you get 1000 entities and a 4 MB payload. So if you're selecting essentially all the data, you're going to get a lot of continuation tokens and make secondary and 3rd requests and 4th requests, et cetera. Table scans are expensive. You want to avoid these for latency sensitive scenarios. If you have a very small amount of data in a table, that's fine. But if I query, for example, on a non-index column, it means the server has to check every piece of data on every single partition across N servers that are possibly serving your data. So as your data gets bigger, these queries get slower. And so we'll look at the performance of some queries. The fastest thing you can do is access an entity by it's identity essentially, The PK and the RK, the row key and the partition key there. It's the fastest thing. Below that, row, range, row key range query. So I've identified a fixed partition key and then I have a range for the row key. I have an upper and a lower bound so I know when to stop iterating on the server. Next below that I have a fixed partition key and then I have a bounded row key. So I'm greater than C, but I still potentially have to check the rest of the data in that partition. And then a multi-partition query. So it's bounded between a few different partitions potentially, and I have some filter on the row key as well. This could be multiple servers with some server hops. And then a query on a non-indexed column. So if I do a query on my first name, I am going to have to check all my data. All right, so for throughput. The number one thing for throughput for tables is use a good batch size. Again, you do need to have the same partition key when you're doing batch operations. But once you do that, you can drive down the per-entity latencies pretty significantly. So we'll do an experiment around that as well. A batch can be 100 entities, and it can be 4 MB in size. And again, insert operations return the entity information back to you which you have to download and parse. Less bytes to read, less bytes to parse, has lower CPU usage to get more out of your application. For latency, again, query selectivity. Try to limit your queries as much as possible to the fewest pieces of data to iterate. And the projection, take only what you need. You also want to avoid any expensive serialization or deserialization logic. That has to be done as part of your query and as part of every time you're serializing an entity there. So if you have some logic in a custom serializer, you want to try to tune that because you're going to pay that cost every single time. Again, we have moved away from or optimized away the reflection hit. But if you're doing something similar, you may consider using some of the function that we've exposed or even writing that code yourself. All right, so let's do a quick batch size experiment. So I have a utility here, again this is against production tenant. And it's just going to spin up a whole bunch of workers that are going to insert data at various batch sizes. So you can see them at 1, 5, 10, 15, all the way up to 100. And if you look, these lower single entity inserts are happening in say 10-15 ms. As I go up, my latency is increasing to about 100. But the per entity latency goes from say 15 here down to 1.8. So I'm sending more data at a time. Do you have a question now? Oh sorry, yes I can. [makes screen bigger] Again, that's live. And so I'm going from 15 down to 1. We're going to make available all the code for all this stuff so you can download it on the Channel 9 site, hopefully, and play around with it yourself. But this is just showing you, if you can batching is a great opportunity to maximize throughput. Let's close this guy down. And then I have the chart here of the kind of sanitized numbers of— you can see here the latency per entity in the blue line and then the overall latency of the overall batch request. So once you get to about a 20 or 30 entity batch size, depending on how big the entities are, we're doing 1 KB entities in that example, you kind of get to the point where the overhead of making the request is now distributed over enough entities that you're getting the most throughput out of your application. There are some resources in the back about tables specifically. We can show you more about key selectivity in a huge blog post. So if you need more research, it's there. All right so the Insert versus InsertOrReplace. So, again we talked about this optimization. So I'll do a quick example. So I'm going to spin up a bunch of workers that are going to insert entities and a bunch of workers that are going to InsertOrReplace them. So the upsert semantic. What you can see here is my inserts. I'm getting about 3300 entities a second. My InsertOrReplace I'm at about 5400 entities a second. So, again, if this makes sense for your application, I highly recommend using it. [silence] All right, so now we get to jump into some code. All the cool features in the next client library coming up— again the RC will be available soon on NuGet and GitHub. The IQueryable, we're going to show you how to serialize 3rd party objects. The simplified projection, and then a new thing for the IQueryable it's dynamic type safe query construction. And then the compiled serializer. So, let's see if I jump into here. All right, so I have a simple application set up here. The first thing, I am going to pretend I am a middle tier service. So I have these customer DTO objects. These are what are bound to a UI or I'm sending them over a service to a client. Typically what someone would have had to do, as in the middle tier they would have had the entity object that they're persisting to their persistence layer and then the entity that they're handing out to their clients and then they have to map between them, which can be a real pain when you're trying to version these entities in the future, you have to go and write new code when somebody adds a new property, things like that. So I'm just going to insert a whole bunch of— well I have 3 different CustomerDTO objects here. Something key to look at is this has no knowledge of Azure, of storage, anything about the table service layer. It's just data. So I have a bunch of properties there. In this same table I have an OrderEntity. Now this is a back entity so this does derive from TableEntity. It knows something about storage. So this means it has a PartitionKey and a RowKey. And so now I have this idea of an entity adapter. So we're going to look at that. What this lets me do is I have this wrapper object that can implement the interface to basically say "I know how to read or write to Azure Storage." And it can still use the optimal—the optimized reflection logic to do this utilization for you so you don't have to write your code yourself. So what I have here is a constructor, I take in the object, and then I have a PartitionKey and RowKey property which I need, Timestamp and ETag. And then if you look the Read and the Write—we have these new methods ReadUserObject, WriteUserObject. So now you can pass in an instance of any type in .NET, and you'll get that same reflection semantic of do I have a public Get and Set? Is it a property type I know how to serialize? We even look for the Ignore attribute. But you could persist framework objects, shapes from the UI, anything you want. And then if you needed to, since this is virtual, you could actually customize this logic. So you'll get the dictionary, for example, on the right, and then you can take things out or mutate it however you like. Now to make this work with Azure storage, we need to have a PartitionKey and a RowKey. So what I've done in this example is I actually have these mappers. They're just a set of 4 different funcs and actions. This says given a given object of type T, give me my PartitionKey. And then to make this even more simple, I've actually created this concurrent cache. So now what I can do is set up the mappers once for a type, basically register it. And I never have to worry about the map being a partition key again. So in this example I'm going to use my CustomerID as my PartitionKey. As you can see this is how it writes it, when it reads it back it will set the PartitionKey to the CustomerID. And for my RowKey I'm using LastName; FirstName. So I'm going to set that up. And now I'm going to execute all these operations. You can see I'm doing a Select, I'm getting an InsertOrReplace operation. And I'm just wrapping these entities. Okay, so now I want to retrieve them. Right now we're just going to create a query, this is our new IQueryable with a DynamicTableEntity. The only note about this is that you would have to use this different query syntax because the entity wrapper object doesn't expose the property types you may want to query on. So you can use this different syntax to construct a more complicated query or you can use the 2.0 implementation of the query constructors where you just pass it essentially a string. It's up to you. The cool thing here to note is that I have this resolver. So IQueryable had this idea of projection where you'd say select, and it would look at what you passed in, what you accessed, and it would basically say "server, he's accessing these 3 properties; therefore, he only wants these 3 properties." And on the client side it would just populate those 3 properties. We wanted to separate out client-side projection, which is mutating the data when you get it, and server-side projection, which is essentially filtering what server data gets sent to you. So we have this resolver, and what this lets you do is peak at the data before you decide what entity type it's going to become. So what I have here is I have this AdapterResolver. So we can look at this. All it's going to do, it's going to create 1 of these new wrappers, set the different keys, call ReadEntity, and return the wrapped object essentially. So now, if you look at the type [silence] if you look at the type of this, it's actually of type CustomerDTO, not EntityWrapper. So I can have queries. I could literally return this IEnumerable to somebody else, and they would never know that I did anything against storage. All right, so you can see my entities are inserte. So now let's look at the SimplifiedProjection. So now, again, with table storage you can have 255 properties. These entities can be quite large, and you don't want to have to write select new, a = a, b = b, c = c, d = d— execute this huge lambda on every single time that you get the data back. So we have this shortcut syntax that just says select TableQuery.Project. And so you just pass in whatever columns you have— this is equivalent to having written select new, a = a, b = b, c = c, et cetera. And you don't have to worry about this extra delegate execution when you do get the data back. So in this case, I'm querying back all that data. I'm going to resolve it, but I only want the first name and last name back. So if I look at the customers here, I can actually see I have CustomerID as well and the reason why is this was the PartitionKey. PartitionKey, RowKey, and Timestamp, ETag, those always get sent back to you because they are reserved properties. And then I have my FirstName and just my LastName. All right, so we can see my projected data now. All right, so now my application populated a bunch of orders for all these customers. Now again, the OrderEntity doesn't look or share anything— any same properties as the CustomerEntity, they're in the same table. But I want to do a more heterogeneous query. So how do I do this with IQueryable? So this is the logic we have. We create the query with a type of DynamicTableEntity. If you're not familiar with DynamicTableEntity, it basically defines the reserved properties PK, RK, Timestamp, et cetera. But it has a property dictionary. So you don't just say here's my property name and I want to access it in this type. So for example this is the StringValue accessor. So what we've done here, is I can access—I can create this query. I want to query against property FirstName as a string value. And then I want to query against property AccountBalance as a DoubleValue. So I've created a filter clause from my account or from my CustomerEntity and then I have an Or/With filter for my OrderEntity. So the OrderData is defined in my OrderEntity. And then I'm going to show this resolver. So this delegate gets executed for every single entity. And what this is going to let me do is as I get data back, I get to peek at the data before I decide how I'm going to deserialize it. So what I've decided is that if my property has Amount then it must be an order. If it has a FirstName then it must be one of these CustomerEntities. All I have to do is instantiate the correct entity and then deserialize it, and it will just work. So if I execute this— Now all this logic is going to do is print out. If it's a customer, print it out the customer way or the order it's going to print out the order. And so now you can see my query worked. I have customers and orders— again zoom in here a little bit— in the same query. And I actually got those concrete POCO objects in the right class without having to define some uber type. So traditionally what you would have had to do is create a CustomerOrderEntity just to construct this query, which would have had to have the superset of properties from both types. So we didn't have to do any of that, which is nice. [silence] All right, so then there's 1 other interesting thing to note. If we go down to TableEntity, which is the base class for all our entities, we can see a couple of interesting things. There's this compiledReadCache and compiledWriteCache, and all this is is a concurrent dictionary. So we can see if we're writing, I have this for CustomerDTO and OrderEntity. And it essentially has created a func that knows how to serialize this type. It's as if you had written the code to go and say I have property A, and I want to create a new property. And I have property B, and I want to create a new property. We only had to do reflection once for the entire life cycle of your application. You never do it again. You can disable this if you like. And, again, you also get that functionality using the static methods to persist 3rd party objects. [silence] All right, so for tables there's a couple resources there. Again, these slides will be available. There's the Deep Dive on the 2.0 library that shows you all the cool things you can do. And then there's a huge blog post about all the internals of table and how to select your key, how to query, batch size, things like that. [Queues] All right, so queues. Yeah. [question from audience] So the 2.0 has the DynamicTableEntity, and you can use that now which basically gives you a property bag. The IQueryable implementation and some of the performance optimizations for avoid reflection, that's 2.1 and that'll be available in a couple of weeks. So we'll have a bigger Q & A at the end, and I'll be down front and also at the booth. All right, queues. Nagle is important because a lot of messages are very small. You'll basically say here's the array for a blob that needs to be processed, things like that, so consider that. Design your application to update a message, if possible, rather than deleting and inserting a new one. This also helps with the scalability targets for a given queue. And then for scale, use multiple queues. We have a Multi-Queue example and some code there. If you need above 2000 entities a second, you need to use multiple queues. There's different strategies on how to do that, which we'll cover. And then also when you retrieve a queue message you'll get a PopReceipt which is essentially how you go back and delete that entity or that message which has become invisible. But you'll also get a de-queue count. This is important, it's a very good best practice to have because for some reason somebody may have inserted malformed data. So what happens if the message that I get causes my role to crash? Well because it's guaranteed message delivery after some period that message is going to reappear in the queue because it hasn't been deleted. And then another role is going to get that message and crash. If I have enough "poison" messages, I could DOS my entire service. So on the client side what I can do is I can look at the de-queue count and say I have some threshold of—you know I've tried to process this 3 times, and every single time it hasn't worked. I may just take that message and stick it in table storage or somewhere else to let somebody investigate later. And then remove it from the queue so that I can keep my roles up without having this issue. So you can detect these "poison" messages. So for throughput. Tune the message count and the peek for Peek & Get. You can get up to 32 messages. If you need to get the most, this is a great way to do it. The only thing you need to consider is your failure mode. So if you do get 32 messages at a time, if you crash on the 1st message, then those other 31 messages are going to be invisible for the invisibility time. And so you may have compute waiting for messages while these guys are just sitting invisible for some period of time. So you want to tune that according to your scenario. You want to tune the invisibility time and the initial invisibility time as well so you can detect how soon messages reappear in a queue. The shorter that period of time is, the more responsive your server has to be to go and make sure these things are deleted or updated. And then use multiple queues. So for multiple queues we have a couple strategies. Something we see a lot is someone will say I have 10 workers, therefore I have 10 queues, I'll stick a message in a queue, the worker will pick it up, and I know he'll get it because of the guaranteed message delivery. This works. The problem with this is in some cases it doesn't evenly distribute the load. So if I insert all the messages in queue 1, workers 2 through 10 may sit idle because they're looking at a different queue. And so it's possible to have peaky performance across VMs. This is simple. It also has a downside when it comes to scaling. So if I need to adjust the size of my deployment of more or less workers, I potentially have to add or remove queues. In which case, I have to drain the messages to make sure I don't lose any information, some of those might be invisible. Another approach is to use a round robin queue. So we have a code sample that will be up where basically you can define N queues. We have them all with the same prefix, you can list them in an efficient way, and then you can create and delete them together. But then you have your workers access these in a round robin fashion. And so now you can insert message into any queue and know that all your workers will evenly distribute the load of who's process what. You can also tune this. If you have a scenario where you have a bunch of lowly utilized queues, you may have a little bit longer latency as the client may make multiple requests between a few queues to make sure that the queues are all fully empty. But in a high throughput scenario, now you can have multiple queues each getting up to 2000 entities a second. [silence] For latency, again, tuning the invisibility time and timeout. Use a smaller message count for Peek & Get. And then use multiple queues. So Get Messages. What's the optimal number of messages to get? For time's sake, I'm just going to show you the results. Per entity latency improves by almost 12x. So you can see there when I do just a single Get I'm just under 10, say 8 ms. As I do 32, I'm down to say close to 1 per entity. All right, debugging. This is huge when you have an application deployed and something is not happening the way you expect, your latencies are high, you're not getting the performance you expect. What do you do as a client to debug these issues? So we have some features on the client, and we have some features on the server, and they're meant to work together to give you a holistic idea of what's happening on your behalf and how you can tune it and make optimizations. So client-side debugging. All the 2.0 libraries have this object called OperationContext. And it's an optional overload for every single API that makes a request to the server. This is all the debugging information you would ever want about all the requests that are made. Some APIs make multiple requests to the server. So for example if I have a parallel PutBlob that's making 8 requests at the same time, what this guy will do is actually track all the requests. It'll have a collection of these RequestResult objects that will have the status code and the service request ID and some information about the start and end time, things like that. Status code and any exception information. You can add any custom user headers. So if you're going through a proxy or something like that. And then you can also get an event every time it sends or receives. So if I sent a request or I received a response, I can actually get a hold of the web request to the response to check headers, do things myself. And then most importantly, there's something there called the Client Request ID. And what this lets you do is specify a string as the request ID. So you may have some logic. Maybe this is user name and application or something like that. Now you can track this request round-trip client side and server side. In 2.1 we've extended this. We have a lot of features targeting this OperationContext so we've enabled logging. And we've enabled kind of a unique model here where you have Opt-in or Opt-out. So what's very common in .NET is you say I want logging, so I'll enable logging on the blob class. And I'll just get this huge fire hose of information for all of my blob requests. In some cases you want this. When you're debugging applications that scale, sometimes the logging makes the issue go away because it generates so much information you have to persist somewhere. And so we wanted to provide clients with very targeted vertical logging across the layers. So, again, you can enable it for the entire application at verbose or what you can do is enable these targeted logging scenarios. So in the app config I have to set myListener and I have the value as verbose. So the maximum thing the listener will record is verbose. And then in the client I have a default logging level at the application I can set. And so now I have myListener set up to log everything, and now I can tell my application actually I don't want to log anything unless I opt-in to it. Now what I can do—and we'll have a quick sample of this— is I can actually set my logging level for a vertical request and get a very targeted log. All right. So you can see in this application I have my default logging level set to off. And now what I'm going to do is make a request that I know should fail. So I create a context and I set this logging level to verbose. Now, again, this application could be at scale, it could be doing thousands of requests a second, but this 1 request for whatever reason is having issues. So I'm going to set the log level. Something else to notice, is that the ClientRequestID because I didn't set it, automatically generated a random GUID for me. So I automatically can have that affinity to coordinate my client- and server-side logs. All right, so I'm going to try to upload a blob that's 8 MB using very small block size and high parallelism in 2 seconds. And most likely this is going to fail. [silence] All right, so I got an exception as expected. These exceptions have that request information for the thing that failed so I can see the status code is 408. I can see the client could not finish the operation. But if I look at this ContextObject now, [silence] I can see I have 16 different requests that were made on my behalf. I have the single ClientRequestID. I can see the start and the end time, things like that. And I can start looking at every single request. So this one was successful, created status code 201. Here's the ServerRequestID. So this is important because now I can for every subrequest I can actually look up and see what the server saw as well. And then I could go down and I could actually find the one that exceptioned out if I wanted to. And, again, that information is already pulled out in the exception. So now, this is great, I have an example client log and I formatted this. I formatted this for ease of use, you probably wouldn't use as many lines. But I have my request ID and all the different phases of execution. So StringToSign this is really useful when you're debugging authentication issues, shared access signature, things like that. So what I'm going to do is I'm going to search for the exception. [silence] Let's see here, let's take this one. So I have this ClientRequestID and then in the storage in my blobs I've enabled logging on my account. We'll get to that in a second. And after some period of time it's flushed out these logs to these blob files that'll be automatically GC'd on my behalf. So now what I did is I went and downloaded those server logs so I can see what's happening on the server. What did the server see in this event? So I have a sample log here. You can see all the different information. Sometimes this is very easy to actually format and open up in Excel. So I formatted this one just for ease of use. And we can see pretty much everything about all the requests from my account. So I can see I did a bunch of PutBlocks, I had a bunch of success. These are 2 very key fields here. This is the storage sever latency, and this is application end-to-end time latency. So this is important. The storage latency is how long did it take the storage server to process your request? And the end-to-end is basically 1st byte to last byte. So if you're on a very slow network, you can see here, this took 247 ms to round-trip a request. But the storage server only took 7 ms in this case. So as I scroll over here, I can see the authentication type. It wasn't anonymous, it wasn't shared access, it was authenticated. You can get the URI. There's stuff about the MD5 as well as information about the size. So the requests were 16 KB as expected. And then I can actually get the user agent so I can see all the different client, potentially, apps that are accessing storage on my behalf. And then the ClientRequestID. So we already decided the c946 is the issue I'm trying to debug. So I can look up my logs, filter it through. And as I scroll down I see a couple of these guys with exceptions. So it wasn't a success. This guy was a network error 500. Can see the end-to-end latency is really high. And then there actually isn't an MD5 right there at all because we didn't actually get all the data, it timed out. So I tried to start a request and I just failed because the client timed out. So with the client's head logs and the server-side logs, you can basically debug most issues you ever run into especially around performance, latency, things like that. And you can coordinate those 2 logs together. I'd also recommend tuning into the build talk in a few weeks because they're going to talk a little bit more about some of the new features for logging. [silence] All right, so the server side storage analytics. You get metrics and logging per abstraction per account. You can enable this very simply via the client API, via the portal. There's just a few buttons. And then there's a retention policy you can specify. So you don't have to go and delete these guys periodically. That'll be done for you, the transaction cost is free. And the scalability targets, because they are stored in your account, they actually have a separate metric so it's not targeted against your SOA of the 20,000 transactions a second, things like that. Again, you can enable it via the portal. This is something I highly recommend when you're using a shared access signature or basically allowing any client to make a request on your behalf that you don't have control of. Because you'll have no idea what sort of request load you're serving if you don't look at what the server is seeing. The logs. They're stored in blobs per service. They're actually—the format of the blob name is by date so year, month, day, things like that. So you can actually target what date range of logs you want to download. A request typically appears within the log within 15 minutes Something else to note is this is best effort logging. It's all done by the front end. If something catastrophic happened and a front end crashed, because they have taken it out of the critical path of your request, so they don't slow down your request, they are asynchronously written to logs. It is possible to miss a request. So if you're using this for billing or something like that, if you're using it for general purpose auditing, throughput, things like that, just something to note. And then you can configure the logging level separately for the different services and for the operation types. So maybe I only want to log the Puts or the Deletes. [silence] Again, best effort logging. So this is the portal screenshot. All you got to do is go in there and click the different request type and specify the interval. The retention policy is very convenient because you never have to worry about accumulating too much logs over time. Metrics. Why turn on metrics? This will give you very high-level summary data about each service. This will let you know about transaction counts, GetBlob requests, things like that. I encourage you to go look at this. There's a blog post that details all the different fields that are logged and more details about that. Very useful in knowing what's going on with your storage account. Capacity metrics are currently provided for blobs only at this time. So you can actually look once—it'll update once a day. And you'll be able to see I'm using N GB or N TB and these are the number of objects I have stored. So, again, this is how you enable that via the portal. [silence] They're in separate name spaces in the $logs and $metrics. So the metrics is a table of the logs of the blob container. Something to note is some client tools, when they see the $ they see that as some sort of reserved container and they don't visualize it. So in some cases you may need to go use the client API to download these, but if you don't see it right away in some sort of client visualization tool that may be why. The cost is just the capacity to keep the data. The transactions for generating and accessing that data. The retention policy and the writing of data. All that stuff is outside of a typical entities per second scalability target. All right, so the summary. You are the expert in your scenario. So you need to know what you need to achieve, how you need to scale in the future, how many transactions per partition and per account, what's the bandwidth? Again these targets are different between Geo and non-Geo accounts. Also consider what your plan is for the future, different localities, things like that. We said it—a single account serves up to 200 TB and 20,000 transactions per second. A lot of very large clients are using multiple storage accounts. At that scale it starts to make sense to have 1 in Europe, 1 in North America, 1 in Asia. To be close to the consumers that are using that data. So the way that it works, you can actually have multiple storage accounts tied to a specific subscription. So you can go in through the portal and manage all those together. Deploy the best practices that we talked about here. We'll see more and more of that put up on the blog. So I have a link to the address storage blog there. And then use the latest storage client libraries via NuGet and GitHub especially if you're on one of the 1.x clients. At this point it's so much faster that it's actually saving people a lot of money just reducing the size of deployments in some cases. This is also something interesting to look at. Nasuni just released their "State of the Cloud Storage Industry" report for 2013. This was done in February. And so there's a blog post there, you can download the full report. But basically what they did is they looked all at the different public cloud storage providers, CSPs, in this study. And they looked at Amazon, they looked at Google, they looked at HP Rackspace, things like that. Two quick things to highlight there: Azure had the best read/write/delete speeds across a variety of sizes, the fastest response times, the fewest errors. The one we're really proud about though is the bottom. Azure not only outperformed the competition, but was the only provider to post zero errors during 100 million reads and writes. So we actually have multiple levels of redundancy throughout the system. And actually the DFS layer actually has data scrubbers just to look for bit drop, things like that. We take data integrity very seriously. So there's a lot of information there. I'd recommend looking at that report. All right, so what's coming up? Future directions. CORS headers, so if you need to serve static web content to a broswer, especially when you're reading it from Javascript, CORS is what you need. So this is something we'll enable so that you can actually have Javascript applications accessing storage on your behalf on a different domain, different origin. The Windows Phone Library is in testing right now. So we should be seeing that shortly. It's based on the same code base as the server codes, so a lot of the best practice and performance and scale and throughput is in the phone library as well. We have a lot of performance we're working on both the server side and the client side. Some of that there will be more blog posts about, and you'll read more about. Some of it will just get better over time. Expanding support for additional client platforms soon. For storage and .NET specifically. We have a WinRT library, we're having a phone library coming out, a desktop library, we also have a Java library. If you go to the GitHub account for Windows Azure, you'll actually just see node and php and I think there's ruby a bunch of other different languages. So we're expanding platform support greatly, especially for people that are running applications that need to target multiple mobile solutions. Geo-replication for queues. And then also read from secondary. So we have Geo-replication allowing more ability of clients to control when they access secondary storage. It can let them do certain types of load types against the secondary data center which is going to have a different scalability target essentially. So you have your 20,000 entities per second on your primary, and then you can also access the secondary in some cases as well. Then expanded reach. There's a blog post there. They're building data centers incredibly fast and in new regions. So they just recently announced an expansion in Asia. So you can go look at more details there. This is important depending on what clients you're trying to reach where. And then resources. So there's a whole bunch of links here. Getting Started and the Pricing information. If you're working for a group that has an enterprise agreement, they probably have some Azure and Azure Storage specifically as part of that. If you have an MSDN account, you actually have some Azure storage and Azure credits toward that as well. The storage team blog is a great resource if you're developing against storage. This is where all the new features are going to be, any sort of issues that we've encountered. The 2nd link from the bottom there is actually a .NET client update issue that we recently uncovered under very high-scale applications. Where basically they changed the behavior slightly, and some clients were actually exhausting all open ports, latencies were going high. We push an update to the NuGet package immediately to address the platform issue. Just to do a new best practice. So clients that are on the latest NuGet didn't really see this. I would highly recommend looking at this because if you're dong any traffic on HP WebRequest, this may affect you. There's something that the .NET team is working on to resolve right now, but that's a good read. And at the bottom is the SOSP paper. So if you want to know about the secret sauce, what's inside the Azure cluster, how the durable file system works, all the rest, there is a paper there as well as a talk by Brad Calder who is the GM of storage. So I recommend that as well. So we have a few minutes left for questions. Anybody? Also I'll be down here in the front. Yeah. [question from audience] Indexing. So you're talking about tables or blobs as well? Yes. So there's a whole bunch of scenarios and features. Right now we don't comment on all the future stuff because of the different deployment schedules and things. But right now you have the 2 index columns and tables specifically. And then prefix listing for queues and blobs. What I'd recommend is looking at that getting the most out of table storage blog. One of the things that we see is that people are using the PartitionKey and RowKey in a way that they could get more out of it had they used it. So for example, I can actually put several index columns in a single key and delimit them. So if I was doing the population of the world, I could do country; province/state; city; street. And then I could do a lexical query across everyone in the USA. Or everyone in Illinois. Or everyone in Chicago. So that 1 key can actually serve multiple dimensions in certain ways. The other thing that we do that I also recommend, is in some cases you can store data in a couple different places depending on how you query it. When people are migrating data from SQL to Azure tables, it's the same storage as blobs. So you're paying cents per GB per month. And so duplicating a single entity for a quick lookup is incredibly cheap, it's very cost effective. We can talk more about that as well later. Any other questions? All right, again, I'll be in the front and down at the booth. Apparently if you activate your subscription for Azure, you have a chance to win an Aston Martin, which sounds awesome. So I'd recommend you do that. And then there's some additional resources. Channel 9 will be posting the presentation as well as the slides, the links, and we'll try to get the code out as well. And then please do an evaluation—oh, sorry. I guess there's chances to win a bunch of different prizes there. And then there is the QR code. So, again, thanks for your time, and I'll be down in the front if you have any questions.

Video Details

Duration: 1 hour, 13 minutes and 51 seconds
Country: United States
Language: English
Genre: None
Views: 10
Posted by: asoboleva99 on Jul 9, 2013

Caption and Translate

    Sign In/Register for Dotsub to translate this video.