Product · August 06, 2020

Unsplash’s dataset is now open source

The most complete high-quality open image dataset ever released.

Luke Chesser · Timothy Carbone · Ali Zahid

When we first released the Unsplash API in 2016, we never dreamed that it would become as popular and useful as it has.

200,000 developers. 5B API requests per month. Native integrations inside Squarespace, Dropbox, Buzzfeed, Medium, Adobe, Wix, Figma, Notion, Trello, and Facebook, plus thousands of others.

What started as a low-key late night slack message exchange— 'Wouldn’t it be cool if we made an API?'—turned into one of the world's most used APIs, bringing 2 million open images from the Unsplash community directly into the workflows of creators, enhancing over 1 billion creations.

Earlier this year we had a similar moment on our team Slack: would’t it be cool if we made the data we use to run Unsplash open for anyone to use?

Today we do just that.

We’re releasing the most complete high-quality open image dataset ever, free for anyone to use to further research in machine learning, image quality, search engines, and more.

While there are other open source image datasets that exist, they’re usually limited in size, expose low quality images, lack variability in the image data, or rely on mass labeling by 3rd party services.

With over 200,000+ contributing global photographers and data sourced from hundreds of millions searches across a nearly unlimited number of uses and contexts, the breadth of intent and semantics contained within the Unsplash dataset opens up entirely new use cases.

In total, the dataset contains over 2M high-quality images, with 16GB of accompanying data covering:

  • keyword-image conversions in search results
  • community and AI generated keywords
  • EXIF, location, and landmarks
  • image categories and subcategories
  • user generated collections and groupings of images
  • image views and downloads stats

Of course, all of the data is completely anonymous and private (except for attribution to the original contributor).

We’re releasing the data in two versions: a lite dataset available for commercial and noncommercial usage, and the full dataset available for noncommercial usage. As the Unsplash library continues to double in size every year, we’ll continue updating the dataset with new fields and new images.

Like when we first released the API, we have some ideas of how the community might use this data, but we're excited to see the creativity of researchers and developers as they dream up new uses.

Visit the Unsplash Dataset to access the datasets, see Github for the documentation, and we'd appreciate your help sharing this news as far and wide as possible so that every researcher and developer can use the dataset.

Share article