Internet, Technology

Yahoo Reveals a Massive Flickr Dataset, Plans to Launch a Supercomputer to Analyze it

Yahoo has introduced a massive Flickr dataset, which includes URLs for 9.3 million photos and 700,000 videos along with meta data. The dataset will be used by the researchers for experimentation purpose.

Yahoo had partnered with  International Computer Science Institute (ICSI) at Berkeley and Lawrence Livermore National Laboratory to compute many open standardized computer vision and audio features to host the data on a specialized supercomputer- having over 50 terabytes to analyze it. The supercomputer will be available on Amazon Web Services later this summer.

Senior research manager, David Ayman Shamma at Yahoo! Labs said,” At Flickr and at Yahoo Labs, we set out to provide something more substantial for researchers around the globe. Today, we are announcing the Flickr Creative Commons dataset as part of Yahoo Webscope’s datasets for researchers. 

The dataset (about 12GB) consists of a photo_id, a jpeg url or video url, and some corresponding metadata such as the titledescription,titlecamera typetitle, and tags. Plus about 49 million of the photos are geotagged! What’s not there, like comments, favourites, and social network data, can be queried from the Flickr API.

Also Read: Big data from IoT devices to create big challenges: Gartner

More from Blog Post: On Flickr, photos, their metadata, their social ecosystem, and the pixels themselves make for a vibrant environment for answering many research questions at scale. However, scientific efforts outside of industry have relied on various sized efforts of one-off datasets for research.

The dataset can host a variety of research studies and challenges. One of the first challenges we are issuing is the MediaEval Placing Task, where the task is to build a system capable of accurately predicting where in the world the photos and videos were taken without using the longitude and latitude coordinates. This is just the start. We plan to create new challenges through expansion packs that will widen the scope of the dataset with new tasks like object localization, concept detection, and social semantics.

To contact the author, write to

Have ideas to share? Submit a post on iamwire

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>