Generate Data from Documents
This page explains how to generate data from reference documents using Transformer Lab.
Generate Fact-based QnA Dataset From Documents
This page explains how to generate a fact-based question and answer dataset from documents using Transformer Lab.
Huggingface (YourBench) Dataset Generation
This page explains how to generate data from reference documents using Transformer Lab leveraging the YourBench framework by 🤗 Hugging Face.
Generate Data from Raw Text
This page explains how to generate data from raw text using Transformer Lab.
Generate Data from Scratch
This page explains how to generate data from just concepts of a dataset using Transformer Lab.
Generate Batched RAG Outputs from Datasets
This page explains how to generate batched RAG (Retrieval-Augmented Generation) outputs from datasets using Transformer Lab.
Generate QA, CoT, or Summary Dataset from Documents (synthetic-dataset-kit)
The synthetic-dataset-kit plugin creates synthetic datasets from your uploaded documents using powerful local language models. It supports three generation modes: QA (Question Answering), CoT (Chain of Thought), and Summary, allowing you to create a wide range of fine-tuning datasets.
Generate Image Dataset from Prompts (dataset_imagegen)
This plugin generates an image dataset using the local text-to-image diffusion model such as Stable Diffusion. It takes prompts from a user-provided dataset and outputs generated images along with associated metadata.
Auto-Caption Images with WD14 Tagger (wd14_captioner)
This plugin uses the WD14 tagger (from the kohya-ss/sd-scripts) to automatically generate Danbooru-style tags for image datasets. It is ideal for preparing high-quality captions for datasets used in fine-tuning Stable Diffusion or similar models.