Healthcare Twitter Analysis Data

About the Data

All of the tweets for this project have been processed and consolidated into a single file HTA_noduplicates.gz
1.85 Gb zipped / 15.80 Gb unzipped

Each of the 4 million rows in this file is a tweet in json format.

Every record contains the following information:
- All the Twitter data in exactly the json format of the original
- Unix time stamp
- data from the original project files:
  - originating file name
  - score
  - author screen name
  - URLs
In addition, 60% of the records have geographic information
- Latitude & Longitude
- Country name & ISO2 country code
- City
- For country code "US"
  - Zipcode
  - Telephone area code
  - Square miles inside the zipcode
  - 2010 Census population of the zipcode
  - County & FIPS code
  - State name & USPS abbreviation

The other files are referenced in the project document found at the GitHub site and represent the data as it underwent various transformations from the original project files, which essentially contained nothing other than the tweet ID, to the final product which contained not only the original full Twitter JSON but also quite a lot of useful geographic and census information besides.

About the Code

This page uses the AWS SDK for JavaScript in the Browser to dynamically query the contents of the S3 bucket.

To keep things simple, the code does not ask for credentials. Instead, it makes unauthenticated calls to the S3 API. This means that it will only work against buckets that are publicly-readable.

The JavaScript SDK makes it very simple to list the objects in an S3 bucket. The code:

links to the JavaScript SDK
configures the SDK with a default AWS region of us-west-1
creates and initializes an S3 object
via the S3 object, makes an unathenticated listObjects call to S3
iterates over the 'data' object returned and displays each file in a row of the DataTable