Dataset: Social-ODP-2k9
Social-ODP-2k9 is a dataset created during December 2008 and January 2009 with data retrieved from the social bookmarking sites Delicious and StumbleUpon, the Open Directory Project and the Web. It is available for research purposes.
Statistics
This dataset is made up by 12,616 unique URLs, all of them with their corresponding social annotations:
-
Data from Delicious:
- Number of users annotating it*.
- Top 10 list of tags*.
- Full Tag Activity (FTA)*.
- Notes*.
-
Data from StumbleUpon:
- Reviews.
Moreover, the category for each URL, extracted from the Open Directory Project, is also available.
If you want to know more on the dataset generation process, please read the paper referenced at the end of this page.
Metadata Format
All the metadata for the dataset documents is provided in XML format, following this pattern:
<documents>
...
<document>
<hash>MD5 hash for document's URL</hash>
<url>Document's URL</url>
<category>ODP Category</category>
<usercount>Number of users annotating it</usercount>
<tags>
...
<tag>
<name>Tag name</name>
<count># of users who annotated the tag</count>
</tag>
...
</tags>
<reviews>
...
<review>A review from StumbleUpon</review>
...
</reviews>
<notes>
...
<note>A note from Delicious</note>
...
</notes>
<detailedtags>
...
<user>
...
<tag>Tags assigned by a user</tag>
...
</user>
...
</detailedtags>
</document>
...
</documents>
Legal Information
By downloading and using this dataset you acknowledge that:
- The data has been compiled to exclusively use it for scientific research purposes.
- The copyright holders retain ownership and reserve all rights.
Reference
Please, consider citing the following paper if you make use of this dataset for your research work:
Arkaitz Zubiaga, Raquel Martínez, and Víctor Fresno. Getting the Most Out of Social Annotations for Web Page Classification. Proceedings of DocEng 2009, the 9th ACM Symposium on Document Engineering, pp. 74-83, Munich, Germany. 2009.
BiBTeX:
@inproceedings{zubiaga2009getting,
title={Getting the Most Out of Social Annotations for Web Page Classification},
author={Zubiaga, Arkaitz and Mart{\'\i}nez, Raquel and Fresno, V{\'\i}ctor},
booktitle={Proceedings of the 9th ACM symposium on Document engineering},
pages={74--83},
year={2009},
organization={ACM}
}
Download
- social-odp-2k9_annotations.tar.bz2 (59 MB): Contains all the URLs making up the collection, as well as their corresponding social annotations.
- social-odp-2k9_documents.tar.bz2 (93 MB): Content for all the web documents on the dataset.