Something-Something V2 Release
Something-Something is a large-scale video initiative for teaching machines common sense of the physical world. Today, TwentyBN releases Something-Something V2, which is twice the size of V1 with many improvements and new features. A technical report on Something-Something V1 can be found here. Recent work on Something-Something V2 can be found here. Please contact us for interest in commercial licenses.
At TwentyBN, our unique video data is a core component that powers our groundbreaking technology, which in turn enables companies to equip edge devices with human-like sensing skills in different industry verticals, such as smart homes, automotive, and healthcare. The large-scale Something-Something dataset is TwentyBN’s contribution to deep learning research in computer vision and visual common sense. To this end, the dataset contains video clips accompanied by exact textual descriptions of what is happening in the clips, such as “Pretending to put[something] onto [something]”.
Originally released in June 2016, Something-Something V1 has undergone rapid growth and improvements over the past year. Today, we are proud to announce the release of Something-Something V2 that comes with many improvements and new features. The dataset can be found here and it is free for academic research. In the next section you will find the most exciting updates and new features we have implemented for this release.
Enhancement in Version 2
Some of the most noteworthy updates in Something-Something V2 are:
- Size: More than doubled. The V2 dataset now contains 220,847 videos as compared to 108,499 clips in V1. Moreover, the total size of all public data released by TwentyBN is now over 370,000 videos.
- Label Quality: Greatly reduced label noise in the dataset. For the new release, we used a sophisticated, automated rating mechanisms to strengthen the video quality verification process.
- Video Resolution: Per request of the research community, the video resolution is now increased to a height of 240px, compared to 100px in V1.
- Data Format: Download format for Something-Something is now WebM using VP9 encoding instead of JPEG images as seen in V1. This results in much smaller download size of 19.4 GB.
- Verbs and Nouns: While Something-Something’s omission of objects aims at bringing research focus to verbs, V2 release will also publish the nouns of these “somethings”. For example, for a label like “Putting [something] onto [something]” there is also an annotated version like “Putting a cup onto a table”. In total, there are 318,572 annotations involving 30,408 unique objects. Now you can utilize Something-Something for both classification and captioning tasks.
- Baseline Model: We are not only releasing a larger and better video dataset but also a baseline model to help beginners kick off their deep learning research and explore better model structures. The baseline model can be found on Github.
- Two Leaderboards: We will list two leaderboards so that you can now submit your results for not only classification but also captioning, for which our researchers have developed a metrics to rank captioning tasks. In addition, each leaderboard will come with two columns, one for top-1 accuracy and another for top-5.
Download and Start Deep Learning
You can now download both the video data and the JSON files for Something-Something V2 using this link, simply by following the instructions on the webpage.
TwentyBN’s Public Data Offerings
Powered by our novel data strategy Crowd-Acting™, TwentyBN’s video datasets, such as Something-Something and Jester, are constantly growing in size and quality. With roots in the academic world, we commit a significant portion of our proprietary video data to be free for academic research so that to encourage research on video understanding and visual common sense in the AI community. If you are interested in licenses for corporate research labs or commercial usage, please contact us.