Yahoo Groups

From Just Solve the File Format Problem
Jump to: navigation, search
File Format
Name Yahoo Groups
Ontology

Yahoo Groups is an email list service run by Yahoo!, which until 2019 also included web-readable forums and file areas, but these were discontinued then leaving only the email-based features. When those features (and the online archives of the messages) were discontinued, users were given the opportunity to download an archive for a limited time.

ArchiveTeam is attempting to archive parts of its content, though much of it is marked as private and hence inaccessible to outside users.

Downloaded archive

When you use the Get My Data feature, you are told to wait for an email notification of the completion of the archive. When this comes (possibly weeks later), you download the file there, in this format:

The file is a ZIP archive, with a long cryptic name with lots of seemingly random numbers (probably in hexadecimal since letters a-f are in it).

Within it, the first layer of subdirectories consists of the names of the groups being archived; they'll archive all the groups you're a member of whether you're a group owner or not.

Beneath that, there are more ZIP archives, one for each of the categories of things being archived, such as files.zip, links.zip, and messages.zip.

The files.zip archive contains all the files from the file area, including subdirectory structures when the files are organized in folders.

The links.zip archive has the web links from the links section of a group, in Internet Shortcut format.

The messages.zip archive has one or more files in mbox format containing the messages from the group in chronological order, with names ending in .00001, .00002, etc. It's broken up into files of approximately 2.3 megabytes. At least in older archives, you can usually read through them by opening them in a text editor; more recent ones are harder because of the prevalence of HTML-format messages and the utter inability of modern email users to trim quoted material, meaning that the archives are full of raw code and excessive repetitive quotage.

Sometimes there's also a medias.json file giving some sort of information in JSON format.

Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox