Generate | Transformer Lab

📄️Generate Data from Documents

This page explains how to generate data from reference documents using Transformer Lab.

📄️Generate Fact-based QnA Dataset From Documents

This page explains how to generate a fact-based question and answer dataset from documents using Transformer Lab.

📄️Huggingface (YourBench) Dataset Generation

This page explains how to generate data from reference documents using Transformer Lab leveraging the YourBench framework by 🤗 Hugging Face.

📄️Generate Data from Raw Text

This page explains how to generate data from raw text using Transformer Lab.

📄️Generate Data from Scratch

This page explains how to generate data from just concepts of a dataset using Transformer Lab.

📄️Generate Batched RAG Outputs from Datasets

This page explains how to generate batched RAG (Retrieval-Augmented Generation) outputs from datasets using Transformer Lab.

📄️Generate QA, CoT, or Summary Dataset from Documents (synthetic-dataset-kit)

The synthetic-dataset-kit plugin creates synthetic datasets from your uploaded documents using powerful local language models. It supports three generation modes: QA (Question Answering), CoT (Chain of Thought), and Summary, allowing you to create a wide range of fine-tuning datasets.

📄️Generate Image Dataset from Prompts (dataset_imagegen)

This plugin generates an image dataset using the local text-to-image diffusion model such as Stable Diffusion. It takes prompts from a user-provided dataset and outputs generated images along with associated metadata.

📄️Auto-Caption Images with WD14 Tagger (wd14_captioner)

This plugin uses the WD14 tagger (from the kohya-ss/sd-scripts) to automatically generate Danbooru-style tags for image datasets. It is ideal for preparing high-quality captions for datasets used in fine-tuning Stable Diffusion or similar models.