Where can I find datasets to classify?
1. Huggingface datasets
Huggingface datasets consists as of writing over 600 datasets in 80+ languages and they can all be browsed by tags in their viewer.
https://huggingface.co/datasets/viewer/
2. Data.world
With a huge amount of data alot can be found here.
3. Kaggle
Kaggle hosts over 60.000 datasets.
https://www.kaggle.com/datasets
4. The Pile
The pile consists of 840GB of text data from a great variety of domains in English.
5. Pushift
API for querying reddit and other social media data.
6. Stack Overflow
Search and fetch stackoverflow posts.
https://data.stackexchange.com/stackoverflow/query/new
7. Data.gov
Find data from the US government
https://catalog.data.gov/dataset
8. Google dataset search
Search for datasets with Google.
https://datasetsearch.research.google.com/
9. OPUS
A great source for multilingual content ranging from subtitles to law.