Labelf Blog

Discover the latest product updates, announcements, and articles from the Labelf team.
January 11, 2021
Data

9 Great Resources for Finding Data

Where can I find datasets to classify?

1. Huggingface datasets

Huggingface datasets consists as of writing over 600 datasets in 80+ languages and they can all be browsed by tags in their viewer.

https://huggingface.co/datasets/viewer/

2. Data.world

With a huge amount of data alot can be found here.

https://data.world/

3. Kaggle

Kaggle hosts over 60.000 datasets.

https://www.kaggle.com/datasets

4. The Pile

The pile consists of 840GB of text data from a great variety of domains in English.

https://pile.eleuther.ai/

5. Pushift

API for querying reddit and other social media data.

https://pushshift.io/

6. Stack Overflow

Search and fetch stackoverflow posts.

https://data.stackexchange.com/stackoverflow/query/new

7. Data.gov

Find data from the US government

https://catalog.data.gov/dataset

8. Google dataset search

Search for datasets with Google.

https://datasetsearch.research.google.com/

9. OPUS

A great source for multilingual content ranging from subtitles to law.

http://opus.nlpl.eu/

Viktor Alm

CEO & Co-Founder @ Labelf AI

Change Cookie Settings