DOCX

To guide use Amazon Kinesis realize the real-time visualization of geographical data

By Lewis Austin,2015-12-22 14:29
11 views 0
To guide use Amazon Kinesis realize the real-time visualization of geographical data

    To guide: use Amazon Kinesis realize the real-time

    visualization of geographical data

    Amazon Kinesis is a fully managed services for large-scale data real-time processing.No matter you are to establish a system to collect data from remote sensors, or building a in multiple applications on different servers do log collection, or to establish a new Internet of things (IoT) solution, Amazon Kinesis can satisfy you every hour from tens of thousands of different sources of terabytes of data collecting and processing requirements.

    For many of these systems, the location of the data is very important for the user.For example, emitted from a remote sensor alarm will not have too big effect, unless the user can clear up the place of the incident.For users, geographic data visualization, drawing graphics on the map is the most effective method.In this article, we will show you how to use Amazon Kinesis to build a support geographic mark system of streaming data, and attach the two simple data visualization method, allows users to quickly read these information.The first visual rendering on a globe, are very effective at small incident shows:

    The second scale visualization can cope with more number of events, it will draw heat map of events for a period of time.

    Below is an overview of the architecture of the system.Among them, the data generated will be pushed to Amazon Kinesis.Then, Amazon Kinesis will deal with these information, and the relevant geographic information is stored in an Amazon ElastiCache Redis Cluster, running on the Elastic Beanstalk node. Js web server will be responsible for the data visualization.

    The whole system using Java and JavaScript code, but don't have to worry about development environment does not support these languages;All of our code will use the Amazon Elastic Compute Cloud compilation.

    Important note

    When on the AWS system described in this article, it will be produced in the Amazon Kinesis and other AWS service fee.Where possible, we will use the AWS Free Tier standard resources.In other words, even if the standard Amazon charges, the cost has also been minimized.

    Geography markup data source

    Given that most readers will have the right to access to a connection device or sensor network, we have to use cases in this paper need to use the geographic mark data to find a

    suitable alternative source.Every day, there will be about 500 million Tweets were posted, Twitter through their streaming API provides developers with a small sample.If tweets sent from a mobile device, then it could do annotation geographical position, make it clear this Tweet the location of the publisher.You can sign up for a Twitter developer account, and then set up an application.Here, please make sure your application is set to read-only Access, then click the Keys and Access Tokens to Create My Access Token button at the bottom of panel.After this step, you will have four Twitter application Secret Key: Consumer Key (API Key), the Consumer Secret Secret (API), the Access Token and Access Token Secret.Once you have the secret key, you are ready to set up everything ready for the AWS solutions.

    Create a Security Group (Security Group)

    Before we set up our system, we need to build a security group for more than one server.On the AWS Console, we need to build a security group by two rules.The first is to allow SSH (22) port transfer, this step allows us to connect to the server.The second rule is that when the source is the Id of the Security Group established, allowing any agreement on all ports transmission.This allows the incoming data from the instance will be assigned to the same security group, at the same time also ensures that all servers can communicate in the system.

    Set up an Amazon Kinesis Stream

    The next task is to build an Amazon Kinesis Stream.On the AWS Console, enter the Region corresponding to the Amazon Kinesis page, and click the Create Stream button.Gave a name to your data flow, and select the number of shards.An Amazon Kinesis Stream consists of one or more than one shard, each shard provides 1 MB/SEC data input and data output of 2 MB/SEC.On AWS, a total capacity of data flow through the number of shards are very simple calculation;From another perspective, the system can also be convenient to increase or decrease the number of shards to adjust the capacity of data flow.A separate look at each shard, they can handle 1000 write transactions per second.In this paper the test process, the Twitter stream API always run under 1000 Tweets per second, so we only need one shard.

    Set up an Amazon ElastiCache instance

    Next, use the AWS Console into Amazon ElastiCache page, and click Launch Cache Cluster button.The next page at the same time, we need to set up a single node Redis cluster.The cache for the selection of port, at the same time, set the Node Type, select cache. M3. The medium Type of Node.Here, please make sure you use the before the established security group.Once Amazon ElastiCache server, you can in the left-hand navigation Cache Clusters page found in the name of the terminal, and click the Nodes connection for the cluster.Another part of the system to the cluster will use the terminal to read and write the name and port name.

    Data producers (Producer)

    If you have the condition of collecting data from remote sensors, so they can be directly

    sends the data to the Amazon Kinesis.In your architecture, we need a "Producer" (producers)

    to extract data from Twitter and transmitted to Amazon Kinesis.Twitter has a named Hosebird

    (HBC) open source Java library, it can do their streaming API data extraction.Used to configure the client code is as follows:

/**

    * Set up your blocking queues: Be sure to size these properly based

     * on expected TPS of your stream

     */

     BlockingQueue msgQueue = new LinkedBlockingQueue(10000);

     /**

     * Declare the host you want to connect to, the endpoint, and

     * authentication (basic auth or oauth)

     */

     StatusesFilterEndpoint endpoint = new StatusesFilterEndpoint();

     // Track anything that is geotagged

     endpoint.addQueryParameter("locations", "-180,-90,180,90");

     // These secrets should be read from a config file

     Authentication hosebirdAuth = new OAuth1(consumerKey,

     consumerSecret, token, secret);

     // create a new basic client - by default gzip is enabled

     Client client = new ClientBuilder().hosts(Constants.STREAM_HOST)

     .endpoint(endpoint).authentication(hosebirdAuth)

     .processor(new StringDelimitedProcessor(msgQueue)).build();

     client.connect();

    As shown in the code, we use "addQueryParameter" on the terminal to filter the data, so can only send the tag location Tweets.HBC dead simple example shows how to retrieve the item on a page filtering, for example, you can use the tag specified limit analysis of Tweets.There is one thing need to note: if you use more than one filter, they will be "OR" calculation.If you added "trackTerms" in the code, then the return results will contain you retrieve the item or a geographical indication Tweets, this is not what we need.Here, typically for HBC to do the most stringent retrieval (usually a search term) to limit the return to the size of the data from Twitter, then set up the second filter in the code.

    According to the above Settings, that is, a shard of Stream, which most can be handled 1000 transactions per second.So when through a single node to send data can produce a problem: under such rate transmission, processing time of each Tweet only 1 millisecond.Even if only considering the network latency that a factor, it becomes difficult to make to Amazon Kinesis call so fast.Therefore, data producers may use a thread pool to concurrently calls.The following code shows how to conduct this step:

    // create producer

     ProducerClient producer = new ProducerBuilder()

     .withName("Twitter")

     .withStreamName(streamName)

     .withRegion(regionName)

     .withThreads(10)

     .build();

     producer.connect();

    Data producers are normally used for subsequent DE - the name of the bug, and Stream, and the name of the Region is set to variable that can be drawn from the configuration file.In the code above, we set up a capacity of 10 thread pool, which is used to process the message you want to transfer to the Amazon Kinesis.The code below shows the HBC Tweets of how to connect to the data producers:

    // get message from HBC queue

     String msg = msgQueue.take();

     // use 'random' partition key

     String key = String.valueOf(System.currentTimeMillis());

     // send to Kinesis

     producer.post(key, msg);

    Each Tweet will be set to a JSON string.They are from HBC queue (msgQueue), and transmission to data producers.Internally, the Producer will put these strings in a queue, and a worker thread will send data to Amazon Kinesis.Because we don't need to order of the messages on the shard, here will use a random partitioning key.

    Before you run on Amazon EC2 Producer, you need to set up a IAM role, this step will allow EC2 instance on a visit to the Amazon Kinesis and Amazon S3.Here we have the AWS console, click IAM, and establish a role.This role is one for the Amazon EC2 role of AWS Service role, custom strategies are as follows:

    {

     "Version": "2012-10-17",

     "Statement": [

     {

     "Sid": "Stmt1392290776000",

     "Effect": "Allow",

     "Action": [

     "kinesis:*", "s3:Get*", "s3:List*"

     ],

     "Resource": [

     "*"

     ]

     }

     ]

    }

    Build a use Amazon Linux AMI m3. The medium Amazon EC2.Specified above, the established IAM role and Security Group.Modify the following script, joined Amazon Kinesis Stream name, used by Region (for example, us - west - 1), and establish a Twitter above when the application is gained by the four secret key.The paste to the User Data is used to modify

instance.Finally, review and start your instance, remember to provide a secret key for the

    follow-up visit.

    #!/bin/bash

    # update the instance

    yum update -y

    # install jdk

    yum install java-1.8.0-openjdk -y

    yum install java-1.8.0-openjdk-devel -y

    yum install git -y

    update-alternatives --set java

    /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.25-0.b18.4.amzn1.x86_64/jre/bin/java cd /home/ec2-user

    # install Apache Maven

    wget

    http://www.dsgnwrld.com/am/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz tar -xzvf apache-maven-3.2.3-bin.tar.gz

    # get the code

    git clone https://github.com/awslabs/aws-big-data-blog.git

    cp ./aws-big-data-blog/aws-blog-kinesis-data-visualization/TwitterProducer/* /home/ec2-user -r

    # create the config file

    echo "aws.streamName = Name of Your Amazon Kinesis Stream" > AwsUserData.properties echo "aws.regionName = Name of your region" >> AwsUserData.properties echo "twitter.consumerKey = Twitter Consumer Key" >> AwsUserData.properties echo "twitter.consumerSecret = Twitter Consumer Secret Key" >> AwsUserData.properties echo "twitter.token = Twitter Access Token" >> AwsUserData.properties echo "twitter.secret = Twitter Access Token Secret" >> AwsUserData.properties

echo "twitter.hashtags = " >> AwsUserData.properties

    # do the build

    /home/ec2-user/apache-maven-3.2.3/bin/mvn package

    This script will run on instance startup and installed Open JDK and Apache Maven.Then, it will download the source code needed for the Producer from the Git, and set up the configuration file.In the end, the Apache Maven will be used to build the source code.In order to start the Producer, follow that SSH to the instance and type:

    java -jar target/TwitterProducer-0.0.1-SNAPSHOT.jar AwsUserData.properties

    In the service startup and Apache Maven and jar files are in place, you may need to wait for several minutes.Here, you should see Tweets being sent to the Amazon Kinesis information.Amazon Kinesis Application will retrieve from Amazon Kinesis JSON Tweet, extract the location information and will be pushed to the Amazon ElastiCache Redis Cluster.

    Applications from Amazon Kinesis Application example, it be hosted on AWS making page (it is very suitable for all Amazon Kinesis starting point of the Application).Amazon Kinesis Application using Jedis with Amazon ElastiCache Redis interact.When Record Processor structure and object initialization time, details of the Redis server can be extracted.In order to deal with records from Twitter, the following code will be added to the Record Processor:

    Coordinate c = null;

     try {

     // For this app, we interpret the payload as UTF-8 chars.

     data = decoder.decode(record.getData()).toString();

     // use the ObjectMapper to read the json string and create a tree

     JsonNode node = mapper.readTree(data);

     JsonNode geo = node.findValue("geo");

     JsonNode coords = geo.findValue("coordinates");

     Iterator elements = coords.elements();

     double lat = elements.next().asDouble();

     double lng = elements.next().asDouble();

     c = new Coordinate(lat, lng);

     } catch(Exception e) {

     // if we get here, its bad data, ignore and move on to next record

     }

     if(c != null) {

     String jsonCoords = mapper.writeValueAsString(c);

     jedis.publish("loc", jsonCoords);

     }

    As shown in the code, the geographical location of tweets extraction in JSON message we received from Amazon Kinesis.This information is packaged into a JSON object, and is pushed to a Redis key.If you expect in a production environment to run an Amazon Kinesis Application, before you can consider through the blog (Hosting Amazon Kinesis Applications on AWS Elastic Beanstalk) code hosted on Elastic Beanstalk.In addition, you can also set the application to the JAR, and run directly on the server.No matter use which kinds of means, you must be the target server for you to create a IAM role.Visit IAM console, and establish a AWS Service Role the following strategies:

    {

     "Version": "2012-10-17",

     "Statement": [

     {

     "Sid": "Stmt1392290776000",

     "Effect": "Allow",

     "Action": [

     "kinesis:*", "cloudwatch:*", "dynamodb:*", "elasticache:*", "s3:Get*", "s3:List*"

     ],

     "Resource": [

     "*"

     ]

     }

     ]

    }

    If you are in the outside run your Amazon Elastic Beanstalk Kinesis Application, set up an Amazon EC2 instance (m3. The meduim and Amazon Linux AMI), and assign you before establishing IAM role.Again, be sure to use before establishing security group;It allows you can communicate with Redis instance.By typing Amazon Kinesis Stream name, the name of the Region, and establish the Amazon ElastiCache Redis cluster name and port.Later, you can copy the following script to the "User Data" part of the service configuration interface #!/bin/bash

    # update instance

    yum update -y

    # install jdk

    yum install java-1.8.0-openjdk -y

    yum install java-1.8.0-openjdk-devel -y

    yum install git -y

    update-alternatives --set java

    /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.25-0.b18.4.amzn1.x86_64/jre/bin/java

    cd /home/ec2-user

    # install Apache Maven

    wget

    http://www.dsgnwrld.com/am/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz tar -xzvf apache-maven-3.2.3-bin.tar.gz

    # get the code

    git clone https://github.com/awslabs/aws-big-data-blog.git

    cp ./aws-big-data-blog/aws-blog-kinesis-data-visualization/KinesisApplication/* /home/ec2-user -r

Report this document

For any questions or suggestions please email
cust-service@docsford.com