Create an HDInsight cluster using the Windows Azure Management portal
0 (0 Likes / 0 Dislikes)
In this video I’ll show you one way to create an
HDInsight cluster ……..
using the Windows Azure Management portal.
In less than 15 minutes, I'll have the nodes in my
cluster created, my file system and
storage configured, and my cluster up and
running.
HDInsight provides everything I need to quickly
deploy Apache Hadoop clusters
that run on Windows Azure.
HDInsight also provides a cost efficient approach
to the managing and storing of data using
Windows Azure Blob storage.
To begin, I log in to the Management Portal.
On the bottom of the screen, click New.
Select Data Services > HDInsight. At this point I
have 2 choices: quick create and custom create.
The Quick Create option lets me name my cluster,
choose from a list of cluster sizes,
add a password for the default Admin account,
and link to an existing storage account.
But I want more control over my cluster so, for
this video, I’ll use custom create.
I name my cluster videocluster. The name I use becomes the address for
my cluster in the azurehdinsight.net domain.
4 nodes is perfect. I don’t want to enter more
than I need as this increases the price.
And, we are going to choose 3.0 that uses Hadoop 2.2 to create the cluster
Selecting the correct datacenter Region is
important when I'm creating a cluster to use with
an existing Windows Azure storage account.
This cluster must be created in the same region
that contains the existing storage account.
Storing my data and my results in the region
where I create my cluster to run my jobs,
ensures better performance and lower costs.
I select West US.
Now I’ll set up the user account for my cluster.
I start by entering the user name, and then create
a password.
I’m not using a Metastore, so I ignore the
remaining fields. On the next
page, I tell HDInsight which storage account to
use.
One of the many benefits of HDInsight is that I
can choose where to store the data that
I’ll analyze and the data that are created as a
result of the analysis.
A native HDFS file system stores the data local to
the compute nodes in the Hadoop cluster
and an Azure Blob Storage container provides a
file system that stores my data in persistent
cloud storage, that is separate from my cluster.
Keeping my data separate means that I can set
up my cluster, run my jobs, and delete the cluster
to reduce expense WHILE my data is persisted,
safe in Windows Azure blob storage,
and unaffected.
My first decision on the Storage screen is
whether I want to use an existing storage
account, create a new storage account, or use
storage from another subscription.
When I select Use Existing Storage, Azure auto-
populates my Account Name and Default
container. The container is where my data is
stored within my blob. The storage container
will be used as my default file system. I can use
the default container or create a new one.
The default container in my existing storage
account contains data that I want to
analyze with this cluster.
But I also want to use this cluster to analyze
some new data that I plan to upload later
so I’ll also create one new storage account.
For my new storage account, I have the same
three choices. This time I’ll create a new storage
account and name it mihart.
When I click the checkmark it takes several
minutes to provision the cluster.
I know it’s done when I see Running in the status
column.
My new storage account is also being created,
and shows Online when that job completes.
Dig into the cluster by selecting the HDInsight tab
and then double-clicking the cluster name.
For more HDInsight videos and content, please
visit windowsazure.com. Thank you.