DOCX

Write a Ruby script to data in-depth mining of Twitter users

By Emily Nelson,2015-12-19 15:40
13 views 0
Write a Ruby script to data in-depth mining of Twitter users

    Write a Ruby script to data in-depth mining of

    Twitter users

    Twitter and some of the API

    Despite the early network involves the man - machine interaction, but now the network has been involved in the interaction between the machines - and this interaction is to use web services to support.Most popular website have such services exist - from various Google services to LinkedIn, Facebook and Twitter, etc.Through the web service to create API, external applications can query or manipulate the content of the website.

    Web services can be done using a variety of ways.Currently one of the most popular approach is declarative State transition (Representational State Transfe, REST).REST one implementation is through HTTP protocol, known as a vehicle for a RESTful architecture allows HTTP exist (such as GET, PUT, POST, DELETE, sort of standard HTTP operations).Above the Twitter API as the medium of an abstract for development.Involved in this approach, there is no REST, HTTP, or the knowledge of the data formats such as XML or JSON, but rather to integrate clearly to the interface of the Ruby language based on object.

    A quick display of Ruby and the Twitter

    Let's discuss how to use the Twitter API in Ruby.First, we need to get the resources needed, if you like me with Ubuntu Linux ?, use apt framework.

    If you want to get the latest full Ruby distributions (about 13 MB downloads), using the command line:

$ sudo apt-get install ruby1.9.1-full

    Then use the gem utility to grab the Twitter gem:

$ sudo gem install twitter

    Now you've got everything I needed this step, we continue to test the Twitter wrapper.This example USES a Ruby called Interactive Shell (Interactive Ruby Shell, IRB) of the Shell, the Shell allows real-time Ruby command to experiment and use of language.The function of the IRB has very much, but we only use it to do some simple experiments.

    Listing 1 shows a session with the IRB, the session is divided into three parts for easier reading.First paragraph (001 and 002 rows) by importing the necessary runtime elements to prepare for the environment (the require method loads and executes the specified library).The next section (003) that is using Twitter gem to display from IBM ? paths ?'s latest tweet messages.As shown, the

    use of the Client: : Timeline module the user_timeline method to display a message, the first example illustrates the method "chain" of Ruby.The user_timeline method returns an Array of 20 messages, then the chain into the method in the first, the idea is to extract the first message from an Array (the first is a method of Array class), then this message from a text field, put it in the output through the puts method.

    The next section (004) using the user defined location field, it is in the form of an unlimited fields, in which the user can provide useful or useless location information.In this case, the User module defined by fetching the location field of User information.

    The last paragraph (line) from 005 to study the Twitter: : Search module, a Search module provides extremely rich interface to Search Twitter.In this case, the first is to create a search instance (005 lines), and then in 006 line specifies a search, search group users in the word contains a recent news, list of results have been cut and editing.Search Settings will have been there, because the search instance maintained, as defined by the filter conditions, you can by performing a search, the clear to remove the filter conditions.

    Listing 1. The Twitter API by the IRB

$ irb

    irb(main):001:0> require "rubygems"

    => true

    irb(main):002:0> require "twitter"

    => true

    irb(main):003:0> puts Twitter.user_timeline("developerworks").first.text dW Twitter is saving #IBM over $600K per month: will #Google+ add to that?> http://t.co/HiRwir7 #Tech #webdesign #Socialmedia #webapp #app => nil

irb(main):004:0> puts Twitter.user("MTimJones").location

    Colorado, USA

    => nil

irb(main):005:0> search = Twitter::Search.new

    => #

    @endpoint="https://api.twitter.com/1/",

    @user_agent="Twitter Ruby Gem 1.6.0",

    @oauth_token=nil, @consumer_secret=nil,

    @search_endpoint="https://search.twitter.com/",

    @query={:tude=>[], :q=>[]}, @cache=nil, @gateway=nil, @consumer_key=nil, @proxy=nil, @format=:json, @adapter=:net_http<

    irb(main):006:0> search.containing("why").to("LulzSec").

    result_type("recent").each do |r| puts r.text end

    @LulzSec why not stop posting and get a full time job! MYSQLi isn't hacking you .

    ...

    irb(main):007:0>

    Next, let's take a look at Twitter users in the model, you can do this by the IRB, but I rearrange the format of the result, in order to simplify the

    description of the internal structure of Twitter users.Listing 2 shows the output results of the user structure that in Ruby is a Hashie: : asurgeon.This structure is useful, because it allows object classes of the hash key accessor methods (public objects).As you can see from listing 2, this object contains a wealth of information (user specific information and rendering), including the current state of the user (with a geocoding information).In a tweet message also contains a large amount of information, you can use the user_timeline class to easily generate this information visualization.

    Listing 2. The interior of the Twitter users analytic structure (Ruby perspective)

irb(main):007:0> puts Twitter.user("MTimJones")

    <#Hashie::Mash

    contributors_enabled=false

    created_at="Wed Oct 08 20:40:53 +0000 2008"

    default_profile=false default_profile_image=false

    description="Platform Architect and author (Linux, Embedded, Networking, AI)." favourites_count=1

    follow_request_sent=nil

    followers_count=148

    following=nil

    friends_count=96

    geo_enabled=true

    id=16655901 id_str="16655901"

    is_translator=false

    lang="en"

    listed_count=10

    location="Colorado, USA"

    name="M. Tim Jones"

notifications=nil

    profile_background_color="1A1B1F"

    profile_background_image_url="..."

    profile_background_image_url_https="..."

    profile_background_tile=false

    profile_image_url="http://a0.twimg.com/profile_images/851508584/bio_mtjones_normal.JPG"

    profile_image_url_https="..."

    profile_link_color="2FC2EF"

    profile_sidebar_border_color="181A1E" profile_sidebar_fill_color="252429"

    profile_text_color="666666"

    profile_use_background_image=true

    protected=false

    screen_name="MTimJones"

    show_all_inline_media=false

    status=<#Hashie::Mash

    contributors=nil coordinates=nil

    created_at="Sat Jul 02 02:03:24 +0000 2011" favorited=false

    geo=nil

    id=86978247602094080 id_str="86978247602094080" in_reply_to_screen_name="AnonymousIRC"

    in_reply_to_status_id=nil in_reply_to_status_id_str=nil in_reply_to_user_id=225663702 in_reply_to_user_id_str="225663702"

    place=<#Hashie::Mash

    attributes=<#Hashie::Mash>

bounding_box=<#Hashie::Mash

    coordinates=[[[-105.178387, 40.12596], [-105.034397, 40.12596],

    [-105.034397, 40.203495],

    [-105.178387, 40.203495]]]

    type="Polygon"

    >

    country="United States" country_code="US" full_name="Longmont, CO"

    id="2736a5db074e8201"

    name="Longmont" place_type="city" url="http://api.twitter.com/1/geo/id/2736a5db074e8201.json"

    >

    retweet_count=0

    retweeted=false

    source="web"

    text="@AnonymousIRC @anonymouSabu @LulzSec @atopiary @Anonakomis Practical reading

    for future reference... LULZ \"Prison 101\" http://t.co/sf8jIH9" truncated=false

    >

    statuses_count=79

    time_zone="Mountain Time (US & Canada)" url="http://www.mtjones.com"

    utc_offset=-25200

    verified=false

    >

=> nil

    irb(main):008:0>

    This is a quick show part of the content.Now, let's study some simple scripts, you can use in these scripts Ruby and the Twitter API to collect and visualize the data.In this process, you will learn some concepts of Twitter, such as authentication and frequency limit, etc.

    Mining Twitter data

    In the next few section introduces several through the Twitter API script to collect and present the available data, these scripts are the key lies in its simplicity, but you can through extension and combined them to create a new function.In addition, this section also mentions Twitter gem apis, this API has more functions available.

    It is important to note is, within the specified time, the invocation of the Twitter API allows client only for limited time, which is the frequency of the Twitter limit requests (now is no more than 150 times an hour), which means that after a certain number of use, you will receive an error message, and ask you to do before you submit a new request for a period of time to wait.

    The user information

    Recall from listing 2 a great amount of information available on each Twitter user, only if the user is protected under the condition of the information is accessible.Let's see how to in a more convenient way to extract the user's information and presented.

    Listing 3 shows a display name (based on the user interface) simple Ruby script to retrieve user information, and then shows some of the more useful content, when need to use to_s Ruby method to convert values to strings.It is important to note that the first user is not protected, otherwise can't access to his/her data.

    Listing 3. Twitter users data extracted a simple script (user. Rb)

#!/usr/bin/env ruby

    require "rubygems"

require "twitter"

screen_name = String.new ARGV[0]

a_user = Twitter.user(screen_name)

if a_user.protected != true

    puts "Username : " + a_user.screen_name.to_s puts "Name : " + a_user.name

    puts "Id : " + a_user.id_str

    puts "Location : " + a_user.location puts "User since : " + a_user.created_at.to_s puts "Bio : " + a_user.description.to_s puts "Followers : " + a_user.followers_count.to_s puts "Friends : " + a_user.friends_count.to_s puts "Listed Cnt : " + a_user.listed_count.to_s puts "Tweet Cnt : " + a_user.statuses_count.to_s puts "Geocoded : " + a_user.geo_enabled.to_s puts "Language : " + a_user.lang

    puts "URL : " + a_user.url.to_s

    puts "Time Zone : " + a_user.time_zone puts "Verified : " + a_user.verified.to_s puts

tweet = Twitter.user_timeline(screen_name).first

puts "Tweet time : " + tweet.created_at

    puts "Tweet ID : " + tweet.id.to_s

    puts "Tweet text : " + tweet.text

end

    If you want to call this script, need to make sure that it is an executable (chmod + x user. Rb), use a user's name to call it.Listing 4 shows the use of user paths to the result of the call, gives the user information and the current state (last a tweet message).To note here is that the focus on Twitter you defined as followers (fans);And call your attention friends (friends).

    Listing 4. The user. The output of the rb example

$ ./user.rb developerworks

    Username : developerworks

    Name : developerworks

    Id : 16362921

    Location :

    User since : Fri Sep 19 13:10:39 +0000 2008

    Bio : IBM's premier Web site for Java, Android, Linux, Open Source, PHP, Social, Cloud Computing, Google, jQuery, and Web developer educational resources Followers : 48439

    Friends : 46299

    Listed Cnt : 3801

    Tweet Cnt : 9831

Geocoded : false

    Language : en

    URL : http://bit.ly/EQ7te

    Time Zone : Pacific Time (US & Canada)

    Verified : false

Tweet time : Sun Jul 17 01:04:46 +0000 2011

    Tweet ID : 92399309022167040

    Tweet text : dW Twitter is saving #IBM over $600K per month: will #Google+ add to that? > http://t.co/HiRwir7 #Tech #webdesign #Socialmedia #webapp #app

    Welcome friend

    Look at your friend (your attention), to collect data to learn about their popularity.In this case, the friends of the data collected and sorted according to their fans number, a simple script as shown in listing 5.

    In this script, after knowing you want (based on the interface display name) after the analysis of the user, create a user hash table, a Ruby hash (or related array) is a kind of can allow you to define the storage key (rather than a simple numerical index) data structure.Then, through Twitter interface name to index the hash table, the associated value is the user number of fans.This process simply traverse your friends and then put the number of their fans in the hash table, and then arrange a hash table (in descending order), and then put it in the output.

    Listing 5. About the popularity of friends script (friends. Rb)

#!/usr/bin/env ruby

    require "rubygems"

    require "twitter"

Report this document

For any questions or suggestions please email
cust-service@docsford.com