Windows Azure Table - May 2009

By Manuel Hart,2014-05-27 15:12
5 views 0
Windows Azure Table - May 2009




     Jai Haridas, Niranjan Nilakantan, and Brad Calder May2009

     Table of Contents

     1 2 3 Introduction ???? 2 Table Data Model ???? 3 Partitioning Tables ???? 5 3.1 Impact of Partitioning???? 6

     3.1.1 3.1.2 3.1.3 Scalability of the table ???? 6 Entity Group Transactions ???? 6 Entity Locality ???? 7 Entity Group Transactions ???? 7 Efficient Queries ???? 7 Scalability ???? 8 Flexible Partitioning ???? 9


     Choosing a Partition Key ???? 7

     3.2.1 3.2.2 3.2.3 3.2.4


     Programming Tables???? 9 4.1 Versioning ???? 10 4.2 Running Example ???? 11 4.3 Defining the Entity Class for the Table ???? 11 4.4 Creating a Table ???? 12 4.5 Inserting a Blog ???? 13 4.6 Querying Blogs ???? 13 4.7 Updating a Blog ???? 14 4.8 Deleting a Blog???? 14 4.9 Entity Group Transactions ???? 14

     4.9.1 4.9.2 Handling the response ???? 17 Errors ???? 17

     5 6



     4.10 Best Practices when using DataServiceContext ???? 19 4.11 Using the REST API ???? 20 Concurrent Updates ???? 20 5.1 Unconditional updates ???? 22 Pagination of query results ???? 22 6.1 Getting Top N Entities ???? 22 6.2 Continuation Tokens ???? 23 Consistency Model ???? 24 7.1 Single Table Consistency ???? 24 7.2 Cross-Table Consistency ???? 24 Tips and Tricks ???? 25 8.1 Retrieve latest items (simulating descending order) ???? 25


     8.2 8.3

     Retrieve using prefix ???? 26 Data Partitioning Example ???? 27

     Micro Blogging Case Study???? 27 Dynamically selecting the granularity of the PartitionKey ???? 29 Different Entity Kinds in the Same Table ???? 30 Adding a new property ???? 36 Deleting a property type ???? 36 Modifying a property type ???? 37

     8.3.1 8.3.2 8.3.3


     Upgrade and Versioning???? 35

     8.4.1 8.4.2 8.4.3


     Windows Azure Table Best Practices ???? 37 9.1 Table Creation ???? 37 9.2 Asynchronous version of ADO.NET Data Services API???? 37 9.3 DataServiceContext settings ???? 37 9.4 Partitioning scheme ???? 38 9.5 Unconditional Updates and Deletes ???? 38 9.6 Handling Errors ???? 38

     9.6.1 9.6.2 9.6.3 9.6.4 9.6.5 Network errors and timeouts on successful server side operations ???? 38 Retry Timeouts and ??Connection closed by Host?? errors ???? 38 Conflicts in Updates???? 39 Tune Application for Repeated Timeout errors ???? 39 Error handling and reporting ???? 39 Improve Performance of ADO.NET Data Service Deserialization???? 40 Default .NET HTTP Connections set to 2???? 40 Turn off 100-continue???? 40 Turning off Nagle may help Inserts/Updates ???? 41


     Tuning .NET and ADO.NET Performance ???? 40

     9.7.1 9.7.2 9.7.3 9.7.4

     9.8 Deleting and then Recreating the Same Table Name???? 41 10 Summary ???? 42

     1 Introduction

     Windows Azure is the foundation of Microsoft??s Cloud Platform. It is the ??Operating System for the Cloud?? that provides essential building blocks for application developers to write scalable and highly available services. Windows Azure provides: ? Virtualized Computation ? Scalable Storage ? Automated Management ? Rich Developer SDK Windows Azure Storage allows application developers to store their data in the cloud. The application can access its data from anywhere at any time, store any amount of data and for any length of time, and


     be confident that the data is durable and will not be lost. Windows Azure Storage provides a rich set of data abstractions: ? Windows Azure Blob ?C provides storage for large data items. ? Windows Azure Table ?C provides structured storage for maintaining service state. ? Windows Azure Queue ?C provides asynchronous work dispatch to enable service communication. This document describes Windows Azure Table, which is the structured storage provided by the Windows Azure platform. It supports massively scalable tables in the cloud, which can contain billions of entities and terabytes of data. The system will efficiently scale out by automatically scaling to thousands of servers as traffic grows. Structured storage is provided in the form of Tables, which contain a set of Entities, which contains a set of named Properties. A few of the highlights of Windows Azure Table are: ? Support for LINQ, ADO .NET Data Services and REST. ? Compile time type checking when using the ADO .NET Data Services client library. ? A rich set of data types

    for property values. ? Support for unlimited number of tables and entities, with no limit on the table size. ? Strong consistency for single entity transactions. ? Optimistic concurrency for updates and deletes. ? For queries returning large numbers of results or queries that timeout, partial results are returned with a continuation token to allow the query to resume where it left off.

     2 Table Data Model

     The following summarizes the data model for Windows Azure Table: ? Storage Account ?C An application must use a valid account to access Windows Azure Storage. You can create a new account via the Windows Azure portal web interface. The user will receive a 256-bit secret key once the account is created. This secret key is then used to authenticate user requests to the storage system. Specifically, a HMAC SHA256 signature for the request is created using this secret key. The signature is passed with each request to authenticate the user requests. The account name is part of the host name in the URL. The hostname for accessing tables is .table.core.windows.net. ? Table ?C contains a set of entities. Table names are scoped by the account. An application may create many tables within a storage account. ? Entity (Row) ?C Entities (an entity is analogous to a "row") are the basic data items stored in a table. An entity contains a set of properties. Each table has two properties, namely the ??PartitionKey and RowKey?? that form the unique key for the entity. ? Property (Column) ?C This represents a single value in an entity. Property names are case sensitive. A rich type set is supported for property values.



     ? ?

     PartitionKey ?C The first key property of every table. The system uses this key to automatically distribute the table??s entities over many storage nodes. RowKey ?C A second key property for the table. This is the unique ID of the entity within the partition it belongs to. The PartitionKey combined with the RowKey uniquely identifies an entity in a table. Timestamp ?C Every entity has a version maintained by the system. Partition ?C A set of entities in a table with the same partition key value. Sort Order ?C There is a single index provided for the CTP, where all entities in a table are sorted by PartitionKey and then RowKey. This means that queries specifying these keys will be more efficient, and all results are returned sorted by PartitionKey and then by RowKey.

     A table has a flexible schema. Windows Azure Table keeps track of the name and typed value for each property in each entity. An application may simulate a fixed schema on the client side by ensuring that all the entities it creates have the same set of properties. The following are some additional details about Tables, Entities and Properties: ?

    Table o o o o ? ? o Table names may contain only alphanumeric characters. A table name may not begin with a numeric character. Table names are case-insensitive. Table names must be from 3 through 63 characters long. Only alphanumeric characters and '_' are allowed.

     Property Name An entity can have at most 255 properties including the mandatory system properties ?C PartitionKey, RowKey and Timestamp. All other properties in an entity have a name defined by the application. PartitionKey and RowKey are of string type, and each key is limited to 1KB in size. Timestamp is a read-only system maintained property which should be treated as an opaque property No Fixed Schema ?C No schema is stored by Windows Azure Table, so all of the properties are stored as pairs. This means that two entities in the same table can have very different properties. A table can even have two entities with the same property name, but different types for the property value . However, property names must be unique within a single entity. Combined size of all data in an entity cannot exceed 1MB. This size includes the size of the property names as well as the size of the property values or their types, which includes the two mandatory key properties (PartitionKey and RowKey). Supported property types are: Binary, Bool, DateTime, Double, GUID, Int, Int64, String. See the table below for limits.

     ? ?


     We use the default limits for Http.sys which enforces a 260 length limit on the URI segment. This results in a limitation on the size of partition and row key since GetRow, Delete, Update, Merge require that the partition and row key be specified as part of a single URI segment. For example, the following URI specifies a single entity with PartitionKey ??pk?? and RowKey ??rk??:

    http://myaccount.windows.core.net/Customers(PartitionKey="pk",RowKey="rk"). Due to the Http.sys limitation, the highlighted portion cannot exceed 260 characters. To get around this problem, operations that have this limitation can still be performed using Entity Group Transactions, since in Entity Group Transactions the URI identifying the resource is part of the request body (see section 4.9 for more information).

     Property Type Details

     Binary Bool DateTime Double GUID Int Int64 String

     An array of bytes up to 64 KB in size. A Boolean value. A 64-bit value expressed as UTC time. The supported range of values is 1/1/1601 to 12/31/9999. A 64-bit floating point value. A 128-bit globally unique identifier. A 32-bit integer. A 64-bit integer. A UTF-16-encoded value. String values may be up to 64 KB in size.

     3 Partitioning Tables

     Windows Azure Table allows tables to scale out to thousands of storage nodes by distributing the entities in the table. When distributing the entities, it is desirable to ensure that a set of entities always stay together on a storage node. An application controls this set by choosing an appropriate value for the PartitionKey property in each entity. Applications need to understand their workload to a given partition, and stress with the simulated peak workload during testing, to make sure they will get the desired results.


     Figure 1 Example of Partitions

     The Figure above shows a table that contains multiple versions of documents. Each entity in this table corresponds to a specific version of a specific document. In this example, the partition key of the table is the document name, and the row key is the version string. The document name along with the wi version uniquely identifies a specific entity in the table. In this example, all versi versions of the same document form a single partition.

     3.1 Impact of Partitioning

     We now describe the purposes of the Table partitions and how to go about choosing a partition key. 3.1.1 Scalability of the table The storage system achieves good scalability by distributing the partitions across many storage nodes. The system monitors the usage patterns of the partitions, and automatically balances these partitions he balance across all the storage nodes. This allows the system and your application to scale to meet the traffic needs of your table. That is, if there is a lot of traffic to some of your partitions, the system will automatically spread them out to many storage nodes, so that the traffic load will be spread across many servers. However, a partition i.e. all entities with same partition key, will be served by a single node. Even so, the amount of data stored within a partition is not limited by the storage capacity of one storage node. 3.1.2 Entity Group Transactions For the entities stored within the same table and same partition (i.e., they have the same partition key and value), the application can atomically perform a transaction involving those entities. This allows the application to atomically perform multiple Create/Update/Delete operations across multiple entities in a /Update/Delete ent single batch request to the storage system, as long as all the entities have the same partition key value system, and are in the same table. Either all the entity operations succeed in the single transaction or they all . fail, and snapshot isolation is provided for the execution of the transaction. In addition, all other s . queries executing in parallel at the same time will not see the result of the transaction, since they will be working off a prior snapshot. Queries will only see the result of the transaction, once it has fully successfully committed. 6

     Entity Group Transaction will require the use of the version header with the version set to "2009-0414" or later. See section 4.1 for more details.

     3.1.3 Entity Locality The entities within the same partition are stored together. This allows efficient querying within a partition. Furthermore, your application can benefit from efficient caching and other performance optimizations that are provided by data locality within a partition. In the above example, all versions of the same document form a single partition. Therefore, retrieval of ??all of the versions of a given document?? will be efficient, since we are accessing a single partition. On the other hand, a query for ??all versions of documents modified before 5/30/2007?? is not limited to a single partition. Since the query has to examine all partitions potentially across several storage nodes such a query would incur a higher cost.

     3.2 Choosing a Partition Key

     Choosing a partition key is important for an application to be able to scale well. There is a tradeoff here between trying to benefit from entity locality, where you get efficient queries over entities in the same partition, and the scalability of your table, where the more partitions your table has the easier it is for Windows Azure Table to spread the load out over many servers.

     3.2.1 Entity Group Transactions If your application needs to use entity group transactions, a PartitionKey needs to be chosen so that its granularity can capture all of the entities you need to perform atomic transactions over. Depending upon your query needs, one general rule of thumb, is to choose the ParitionKey such that it only groups together the entities that need to be grouped together to perform entity group transactions over them. This groups together the entities that need to be operated on atomically, while potentially creating many partitions to allow Windows Azure Table to load balance those partitions across our servers to meet your Table??s traffic needs.

     3.2.2 Efficient Queries We recommend that high-frequency, latency-critical queries use the PartitionKey as a query filter condition. Using the ParititionKey in the query filter limits the query execution to a single or a subset of partitions (depending upon the condition used), thereby improving query performance. If the PartitionKey is not part of the query, then the query has to be done over all of the partitions for the table to find the entities being looked for, which is not as efficient. The following are some rough guidelines and suggestions for how choose a PartitionKey for your table for efficient querying:


     1. First determine the important properties for your table. These are the properties frequently used as query filters. 2. Pick the

    potential keys from these important properties. a. It is important to identify the dominant query for your application workload. From your dominant query, pick the properties that are used in the query filters. b. This is your initial set of key properties. c. Order the key properties by order of importance in your query. 3. Do the key properties uniquely identify the entity? If not, include a unique identifier in the set of keys. 4. If you have only 1 key property, use it as the PartitionKey. 5. If you have only 2 key properties, use the first as the ParitionKey, and the second as the RowKey. 6. If you have more than 2 key properties, you can try to concatenate them into two groups ?C the first concatenated group is the PartitionKey, and the second one is the RowKey. With this approach, your application would need to understand that the PartitionKey for example contained two keys separated by a ??-??.

     3.2.3 Scalability Now that the application has its potential set of keys, you need to make sure that the partitioning chosen is scalable: 1. Given the PartitionKey above, will it result in partitions that will become too hot, based on your applications access patterns, to be served efficiently from a single server? One way to determine if this would result in a hot partition is to implement a Table partition stress test. For this test, create a sample table using your keys and then exert peak stress for your given workload on a single partition to ensure that the table partition can provide the desired throughput for your application. 2. If the Table partition stress test passes, then you are done. 3. If the Table partition stress test does not pass, select a more fine-grained PartitionKey. This could be done either by choosing a different PartitionKey or modifying the existing PartitionKey (for example, by concatenating it with the next key property). The purpose of this is to create more partitions so that a single partition does not become too large or too hot. 4. We designed the system to scale and be able to handle a large amount of traffic. However, an extremely high rate of requests may lead to request timeouts, while the system load balances. In that case, reducing your request rate may decrease or eliminate errors of this type. In general, most users will not experience these errors regularly; however, if you are experiencing high or unexpected Timeout errors, contact us at the MSDN Windows Azure forum to discuss how to optimize your use of Windows Azure Table and prevent these types of errors in your application.


     3.2.4 Flexible Partitioning You may also need to consider the extensibility of the keys you choose, especially if the user traffic characteristics are still unclear when the keys are chosen. In that case, it will be important to choose keys that can be easily extended to allow finer partitioning, so that if the partitioning turns out to be too coarse-grained, you can still extend your current partitioning

    scheme. A detailed example is discussed later in this document.

     4 Programming Tables

     The following basic operations are supported on tables and entities ? Create a table or entity ? Retrieve a table or entity, with filters ? Update an entity (but not a table) ? Delete a table or entity. ? Entity Group Transactions that support transactions across entities in the same table and partition. To use tables in a .NET application, you can simply use ADO.NET Data Services. The following table summarizes the APIs. Since the .NET ADO.NET Data Services api results in transmission of REST packets, applications can choose to use REST directly. Besides allowing non .NET languages access to the store, REST also allows fine grained control on serialization/deserialization of entities which is useful when dealing with scenarios such as a single table containing different kinds of entities or dealing wanting to have more properties for your object than is allowed for a given entity. Operation Query ADO.NET Data Services LINQ Query HTTP Verb GET Resource Table Description Returns the list of all tables in this storage account. If a filter is present, it returns the tables matching the filter Returns all entities in the specified table or a subset of entities if filter criteria are specified. Updates property values within an entity. A PUT operation replaces the entire entity and can be used to remove properties. Updates property values within an entity.


     Update entire entity

     UpdateObject & PUT SaveChanges(SaveCha

     ngesOptions.Repla ceOnUpdate)


     Update UpdateObject & partial entity SaveChanges()




     Operation Create new entity

     ADO.NET Data Services AddObject & SaveChanges()

     HTTP Verb POST

     Resource Table Entity

     Description Creates a new table in this storage account. Inserts a new entity into the named table. Deletes a table in this storage account. Deletes an entity from the named table. Entity group transaction support is provided through a batch operation across entities having the same partition key in a single table. In ADO.NET Data Services, the option to SaveChanges dictates that the request needs to be sent as a single transaction.

     Delete entity DeleteObject & SaveChanges()


     Table Entity

     Entity group transaction

     SaveChanges(SaveCha ngesOptions.Batch)



     Advanced operations on tables include the following, which will be discussed in more detail below ? Pagination ? Handling conflicts due to concurrent updates

     4.1 Versioning

     For all of the Windows Azure Storage solutions, we have introduced a new HTTP header called ??x-msversion??. All changes to the storage APIs will be versioned by this header. This allows prior versions of commands executed against the storage system to continue to work, as we extend the capabilities of the existing commands and introduce new commands. The x-ms-version should be specified for all requests coming to Windows Azure Storage. If there is an anonymous request without a version, then the oldest support version of that command will be executed by the storage system. By PDC 2009, we plan to require the x-ms-version to be specified by all non-anonymous commands. Until then, if no version specified for a given request, we assume that the version of the command the request wants to execute is the CTP version of the Windows Azure Storage APIs from PDC 2008. If a request comes in an invalid x-ms-version, it will be rejected. The current supported version is ??x-ms-version: 2009-04-14??. This can be used for all commands and requests sent to Windows Azure Storage. The new functionality we introduce for Windows Azure 10

     Tables with this version is Entity Group Transactions, and specifying this version header is required to use this new feature.

     // add the version header using SendingRequest event. This is the same // place where the date header would have been added. However, the // version header is not part of the canonicalized string used for // creating the signature context.SendingRequest += new

    EventHandler( delegate(object sender, SendingRequestEventArgs requestArgs) { HttpWebRequest request = requestArgs.Request as HttpWebRequest; request.Headers.Add( "x-ms-date",

    DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture)); request.Headers.Add("x-ms-version", "2009-04-14"); // ???? add authorization header using shared key lite });

     4.2 Running Example

     In the examples below, we describe operations on a ??Blogs?? table. This table is used to hold blogs for a MicroBlogging application. The MicroBlogging application has two tables ?C Channels and Blogs. There is a list of Channels, and blogs are posted to a particular channel. For this application, users would subscribe to channels and they would

    get the new blogs for those channels every day. In this example, we only focus on the Blogs table, and give examples of the following steps for the Blogs table: 1. Define the schema for the table 2. Create the table 3. Insert a blog into the table 4. Get the list of blogs from the table 5. Update a blog in the table 6. Delete a blog from the table 7. Insert multiple blogs in a table

     4.3 Defining the Entity Class for the Table

     The schema for a table is defined as a C# class. This is the model used by ADO.NET Data Services. The schema is known only to the client application, and simplifies data access. The server does not enforce this schema. The following shows the entity definition for the Blog entities to be stored in the Blogs table. Each blog entity has the following information. 1. A channel name ?C the blog has been posted to this channel. 11

     2. The posted date. 3. Text ?C the content of the blog body. 4. Rating ?C the popularity of this blog. For this table ??Blogs??, we choose the channel name to be the PartitionKey and the posted date to be the RowKey. The PartitionKey and RowKey are the keys in the ??Blogs?? table and this is indicated by declaring the keys using an attribute on the class - DataServiceKey. The ??Blogs?? table is partitioned by the ChannelName. This allows the application to efficiently retrieve the latest blogs for a channel to which a user has subscribed. In addition to the keys, user specific attributes are declared as properties. All properties with a public getter and setter are stored in Windows Azure table. So in the below example: ? Text and Rating are stored for the entity instance in Azure table. ? RatingAsString is not because it does not have a setter defined. ? Id is not stored since the accessors are not public.

     [DataServiceKey("PartitionKey", "RowKey")] public class Blog { // ChannelName public string PartitionKey { get; set; } // PostedDate public string RowKey { get; set; } // User defined properties public string Text { get; set; } public int Rating { get; set; } public string RatingAsString { get; } protected string Id { get; set; } }

     4.4 Creating a Table

     Next we show how to create the ??Blogs?? table for your storage account. Creating a table is the same as creating an entity in a master table called ??Tables??. Every storage account has this master table already defined, and every table used by a storage account must register the table name with this master table. The class definition for this master table is shown below where the TableName property represents the name of the table that is to be created.

     [DataServiceKey("TableName")] public class TableStorageTable { public string TableName { get; set; } }

     The actual creation of the table happens as follows.

Report this document

For any questions or suggestions please email