Twitter is a popular social-networking and messaging service, accessible through the web and mobile device apps, allowing users to write 140-character messages publicly or privately. Often the messages include hyperlinks, which get sent through URL shorteners (so they might suffer linkrot if the shortening services go away). Some of the conventions of the service are discussed in the article on Hashtags, at-signs, retweets, etc.
Of interest to archivers is the fact that, as of late 2012, Twitter has started rolling out a feature to permit users to save their entire tweet history as an archive file.
The Data Transfer Project is building a project for moving data between services such as this one.
Twitter is also one of the engulf-and-devour Internet megacorporations now which has swallowed up, digested, and excreted other Internet services, a 2013 example being Posterous.
The unrelated system Twister is an open-source, encrypted, decentralized implementation of a similar concept to Twitter.
Contents |
Downloaded Twitter archive
If you have been given the option to download your Twitter history (it has been given gradually to users, so you may or may not have this option now yourself, but probably will in the future if you don't now), it appears as a button at the bottom of the "Settings" page in your account. Pressing it causes the generation of an archive of your tweets to be queued, and when it is finished (minutes? hours? whenever?) you get e-mailed at the registered address associated with the account with a link to retrieve your archive. There, you can download it as a ZIP archive (tweets.zip) containing this file and directory structure:
- README.txt: an ASCII text file (with long lines that scroll way off to the right if your text viewer doesn't wrap long lines) giving some information about the format
- index.html: HTML file which, when loaded in a browser, lets you view your tweets. The tweets themselves aren't actually in this file, but it pulls in a bunch of JavaScript from the subdirectories, which in turn load the tweets from data files.
- css: Subdirectory with Cascading Style Sheets.
- application.min.css Stylesheet (formatted in hard-to-read manner with no line breaks)
- data: Subdirectory with data files.
- csv: Subdirectory with CSV files.
- YYYY_MM.csv: A series of files named by year and month with the tweets in the form of comma-separated values (CSV). The columns are: "tweet_id", "in_reply_to_status_id", "in_reply_to_user_id", "retweeted_status_id", "retweeted_status_user_id", "timestamp", "source", "text", "expanded_urls". The timestamp is in UTC time, in the format YYY-MM-DD HH:MM:SS +0000.
- js: Subdirectory with JavaScript (user-specific, encoding details about the tweets).
- payload_details.js
- tweet_index.js
- user_details.js
- tweets
- YYYY_MM.js: A series of files named by year and month with the tweets in JSON form, with a one-line header turning each file into a JavaScript variable assignment. (Strip that line if using the JSON data elsewhere.)
- csv: Subdirectory with CSV files.
- img: Subdirectory with graphics.
- js: Subdirectory with JavaScript.
- application.min.js: Script used in displaying tweets (formatted in a hard-to-read manner with no line breaks).
- lib: Subdirectory with various 'library' files used by the scripts.
- bootstrap: various JavaScript, CSS, and graphics.
- hogan: Contains another JavaScript file.
- jquery: Contains another JavaScript file.
- twt: Contains some more JavaScript, CSS, and graphics.
- underscore: Contains another JavaScript file.
TwitPic
TwitPic, a popular hosting service for pictures used on Twitter until it abruptly shut down in 2014, was not affiliated with Twitter, and in fact its shutdown was because Twitter suddenly objected (after years of its operation) to its use of a name resembling Twitter's. People suddenly had to export their images from that service to save them from deletion.
Documentation
- Twitter API info
- How Twitter counts characters (official Twitter documentation)
- Discussion of 'Grailbird' JavaScript object used in Twitter archive
Software
- Python script to create 'Tweet this' link that doesn't require JavaScript
- Emojibot: Twitter bot that automatically translates your tweets into emoji, via the magic of Mechanical Turk
- Social Feed Manager
- A utility for loading tweets into elasticsearch (Python)
- Social feed harvester (Python)
- twarc: A command line tool for archiving Twitter JSON
- linkerd: Twitter-style Operability for Microservices
- Hydrator: Turn Tweet IDs into Twitter JSON from your desktop!
Other links and references
- Twitter (official site)
- Wikipedia article on Twitter
- Librarians of the Twitterverse
- How to become internet famous for $68
- Science-fictional Twitter bug report
- Miscellaneous symbols in Unicode (useful to copy and paste for tweets)
- Twitter files for IPO
- The Inventor Of The Twitter Hashtag Explains Why He Didn't Patent It
- Forward secrecy at Twitter
- Twitter Guide Book – How To, Tips and Instructions by Mashable
- Spring Cleaning Who Has Access to Your Data
- Twaggies: cartoons inspired by tweets
- Twitter Still Has An Identity Problem Eight Years Later
- How to post GIFs on Twitter
- What Twitter Isn’t Telling You About GIFs
- Twitter Now Showing You Even More Tweets from People You Don’t Follow
- On archiving tweets
- Building a complete tweet index
- Harvesting the Twitter Streaming API to WARC files
- Twitter API apparently sometimes gives tweets from accounts not followed