Archiving Social Media Data

Documenting the Now
This website has user-friendly means of collecting and preserving digital content. There are several tools & resources:
Hydrator | Twarc | Diff Engine | Tweet Catalog

This blurb from the Tweet Catalog link says it all:

Twitter’s terms of service don’t allow tweet datasets to be published on the web, but they do allow tweet identifier datasets to be shared. This speaks to users rights as content creators, while also allowing researchers to share their data with others.

This site is a catalog of datasets that are publicly available on the web. If you would like to turn these tweet identifier datasets back into the original JSON first download the dataset and then use the Hydrator desktop application, or Twarc if you are comfortable working at the command line.

You can add your own datasets to the catalog by following these instructions. If you’d like updates when datasets are added please subscribe to the RSS feed. All metadata listed here is licensed CC0. You may want to refer to our code of conduct if you have questions or concerns about the datasets we list here.

Bonus Material
Here are two recent articles that address the ethics of archiving data from Twitter as well as strategies for ethically archiving social media posts:

Towards an Ethical Framework for Publishing Twitter Data in Social Research: Taking into Account Users’ Views, Online Context and Algorithmic Estimation
M. Williams, P. Burnap and L. Sloan | Sociology
May 26, 2017

Archiving information from geotagged tweets to promote reproducibility and comparability in social media research
K. Kinder-Kerlanda, K. Weller and M. Zenk-Moltgen | Big Data & Society
November 1, 2017

Comments are currently closed.