Release 2025/8/25
Aug Release!
We’re introducing powerful tools to help teams scale high-quality data creation and reduce unnecessary storage overhead. This release focuses on deduplication, generative data expansion, and batch efficiency improvements.
New Features
Data Deduplication
Automatically identify and remove highly similar images from your datasets. This reduces storage usage, improves dataset diversity, and helps avoid redundancy in model training or annotation. The deduplication process uses visual similarity scores to ensure only unique, meaningful samples are retained.


Data Generation (Text / Image / Video → Short Video)
A brand-new generative module that lets you expand your datasets using flexible inputs. You can now generate short videos using:
Text prompts: Describe scenes or actions to synthesize data.
Reference images: Use visual anchors to guide generation.
Clips: Extend or interpolate between existing frames. This tool helps users bootstrap datasets for rare cases, edge scenarios, or novel environments.

Batch Generation Support
Run data generation at scale with batch operations. Upload a list of prompts or reference sets to produce multiple synthetic video samples in a single job. Perfect for augmenting underrepresented classes or testing model robustness.

If you’d like to enable the Data Generation related features, please contact us for early access.
Data Import Performance Optimization
The system for data import has been significantly optimized to reduce processing and wait time, allowing for faster ingestion of large scale datasets.
Coming Soon to DataVerse
Generated Data Integration & Video Clip Curation We’re extending the generative pipeline by allowing direct import of generated videos into project datasets, making it easier to manage and reuse synthetic data. In addition, we’re introducing video clip curation tools that let system trim, tag, and organize generated or uploaded clips—turning raw material into structured training assets more efficiently.
These upgrades will streamline the full cycle from generation → selection → training, and give teams tighter control over what enters their model development pipeline.
Stay tuned—this feature is coming soon to DataVerse!
Support
For technical support, please contact our support team during business hours.
Last updated