Dataset: SocialBM0311
SocialBM0311 is a large-scale social tagging/bookmarking dataset collected from It contains the complete bookmarking activity for almost 2 million users from the launch of the social bookmarking website in 2003 to the end of March 2011. The dataset contains:
- 118,520,382 unique URLs
- 14,723,731 unique tags
- 1,951,207 users
The files contain one bookmark per line, with the following fields separated by tabs:
url_md5 user_id url unix_timestamp tags
- 'url_md5' is the MD5 hash of the bookmarked URL. Note that Delicious uses the MD5 hash as the ID for URLs, and can be used to find it through
- 'user_id' is the ID for the user who saved the bookmark. The usernames have been fully anonymized for this dataset, and the user IDs provided with the dataset have been randomly assigned to users.
- 'url' is the URL being bookmarked.
- 'unix_timestamp' refers to the date in which the bookmark was saved, using the standard UNIX time format. Note that these timestamps are rounded to days, and do not provide the specific time (limited by the system during data collection).
- 'tags' include a tab-separated list of the tags (keywords) used in the bookmark.
Legal Information
By downloading and using this dataset you acknowledge that:
- The data has been compiled to exclusively use it for scientific research purposes.
- The copyright holders retain ownership and reserve all rights.
Please, cite the following paper if you make use of this dataset for your research work:
Arkaitz Zubiaga, Victor Fresno, Raquel Martinez, Alberto Perez Garcia-Plaza,
Harnessing Folksonomies to Produce a Social Classification of Resources,
IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 8, pp. 1801-1813, Aug. 2013, doi:10.1109/TKDE.2012.115
The dataset (42 GB after decompressing) is provided in 2 different compression formats (download just one, they both contain the same file!):
- socialbm0311.7z (9.4 GB)
- socialbm0311.tar.bz2 (11 GB)