Objective:

The objective of the project is to crawl social media data for a subset of its users (100-500) and report analysis on the extracted data.

About the project:

In this project, we have collected a list of Twitter Users and developed a network, which is used to later perform the analysis on the obtained graph using Python. For the reasons mentioned later in this report, the users have been specifically selected to include only the major National Football League (NFL) players. Further, the connections of each of these players are filtered to include only those users that are part of our network.

Two different graphs have been studied from the network:

Finally different network measures such as degree distribution, measures of centrality, etc are calculated for each of these graphs.

Project Requirements:

Other than the standard Python Libraries, following two libraries are required to run the project:

Project Outline:

The project is broadly divided into following major steps:

  1. Data collection and filtering
  2. Building a network structure
  3. Calculating network measures
  4. Data collection and filtering
    • To start with, we selected the Twitter platform for this project because a vast number of people use twitter which in turn generates a lot of data. This amount of data will help us better understand and provide better insights which will achieve the goal of the project.
    • But as per Twitter API rules, the information retrieval from a Twitter account is possible only if either the account is a ‘public’ account or if the API owner follows that account.
    • As per general observation, celebrities usually have their account public. To narrow down, we formed a network of only the players of the National Football League.
    • But, in order to avoid the manual task of listing down these players, the ‘scraper.py’ file (attached with the project) was used to get a list of names of these players.
    • This file, using ‘beautifulsoup’, fetches the information from https://www.fantasypros.com/nfl/cheatsheets/top-players.php.
    • Parsing the information returns the required list.
    • Using this list and the Twitter API from ‘python-twitter’ package, we had our required list of users and their information.
    • A simple check for the ‘verified’ users only, ensures that the resultant users are indeed the players’ genuine accounts.
  5. Building a network structure
    • Used networkx library to build the graph of the network, using the information collected in the previous step
    • As a result, a directed graph and an undirected graph are generated.
  6. Calculating Network Measures
    • Following network measures were calculated for each of the graph:
    • Degree distribution and plotting it on a histogram
    • Centrality measures such as: betweenness, closeness, eigenvector, pagerank.
    • Diameter of the graph
    • Reciprocity

Results

For the Directed Graph of the network:

Graph

Degree Distribution

Network Measures:

Centrality Measure 1 2 3 4 5
Top 5 betweenness centrality ‘James Conner’, 0.0421 ‘George Kittle’, 0.0402 ‘Ezekiel Elliott’, 0.0344 ‘Patrick Mahomes II’, 0.033 ‘Trey Burton’, 0.0315
Top 5 closeness centrality ‘Jarvis Juice Landry’, 0.5727 ‘Ezekiel Elliott’, 0.5659 ‘Mark Ingram II’, 0.5642 ‘Deandre Hopkins’, 0.5625 ‘Saquon Barkley’, 0.5592
Top 5 eigenvector centrality ‘Jarvis Juice Landry’, 0.1943 ‘Lamar Jackson’, 0.1816 ‘Ezekiel Elliott’, 0.1766 ‘Keenan Allen’, 0.1733 ‘Deandre Hopkins’, 0.1695
Top 5 pagerank centrality ‘Mark Ingram II’, 0.0195 ‘Larry Fitzgerald’, 0.0162 ‘Mike Evans’, 0.0137 ‘Tom Brady’, 0.0135 ‘Tyler Lockett’, 0.0135

For the undirected graph of the network:

Graph

Degree Distribution

Network Measures:

Centrality Measure 1 2 3 4 5
Top 5 betweenness centrality ‘Trey Burton’, 0.048 ‘Miles Sanders’, 0.0327 ‘George Kittle’, 0.0298 ‘DIGGS’, 0.0292 ‘Ezekiel Elliott’, 0.0264
Top 5 closeness centrality ‘Trey Burton’, 0.6077 ‘Ezekiel Elliott’, 0.587 ‘DIGGS’, 0.5851 ‘Miles Sanders’, 0.5833 ‘Jarvis Juice Landry’, 0.5833
Top 5 eigenvector centrality ‘Trey Burton’, 0.1716 ‘Jarvis Juice Landry’, 0.165 ‘Ezekiel Elliott’, 0.1618 ‘DIGGS’, 0.1549 ‘Lamar Jackson’, 0.1513
Top 5 pagerank centrality ‘Trey Burton’, 0.0145 ‘Miles Sanders’, 0.0121 ‘DIGGS’, 0.012 ‘Ezekiel Elliott’, 0.0119 ‘Jarvis Juice Landry’, 0.0115

Inferences

References