The use of social media data and data science to gain insights into health care and medicine
This website gives easy access to the datafiles created by George Fisher for this project. Refer to https://github.com/grfiv/healthcare_twitter_analysis for the details of the data and the project so far as I took it.
If you click a file name it will download to your computer; if you right-click and select 'Copy link address' you can paste the link into a command or program (wget, for example).
The files listed below are hosted in an Amazon S3 bucket. They are displayed here in a web page that uses the AWS SDK for JavaScript in the Browser to read and list objects stored in the Amazon S3 bucket. You can view the source of this page to see the JavaScript that powers it (in Chrome, right-click and select 'View page source').
All of the tweets for this project have been processed and consolidated into a single file HTA_noduplicates.gz
1.85 Gb zipped / 15.80 Gb unzipped
Each of the 4 million rows in this file is a tweet in json format.
The other files are referenced in the project document found at the GitHub site and represent the data as it underwent various transformations from the original project files, which essentially contained nothing other than the tweet ID, to the final product which contained not only the original full Twitter JSON but also quite a lot of useful geographic and census information besides.
This page uses the AWS SDK for JavaScript in the Browser to dynamically query the contents of the S3 bucket.
To keep things simple, the code does not ask for credentials. Instead, it makes unauthenticated calls to the S3 API. This means that it will only work against buckets that are publicly-readable.
The JavaScript SDK makes it very simple to list the objects in an S3 bucket. The code:
Amazon S3 Explorer
File | Size |
---|