Narrative-grounded synthetic CRM datasets generated from simulated commercial worlds โ for teaching, benchmarks, and research.
Public lead-scoring datasets are too small, too overused, or too shallow to sustain serious teaching or research. leadforge generates datasets that feel like they came from a real CRM.
Data isn't sampled from a distribution โ it emerges from a simulated company, product, buyers, and go-to-market motion, making every dataset narratively consistent.
Five motif families (fit-dominant, intent-dominant, sales-execution-sensitive, demo/trial-mediated, buying-committee-friction) are stochastically rewired so no two datasets share the same causal structure.
Intro, intermediate, and advanced โ calibrated by signal-to-noise ratio and conversion rate so you can benchmark a novice project, a serious model, or a stress-test in the same framework.
The instructor companion ships the hidden causal graph, latent registry, mechanism summary, and full-horizon relational tables โ everything redacted from the student view.
Accounts, contacts, leads, touches, sessions, sales activities, opportunities, customers, and subscriptions โ all with deterministic IDs and FK integrity.
Every public feature is snapshot-safe: no post-anchor events, no terminal-stage columns, no conversion-conditional tables. The redaction contract is code, not convention.
# Generate a full bundle
leadforge generate \
--recipe b2b_saas_procurement_v1 \
--seed 42 --mode student_public \
--difficulty intermediate \
--n-leads 5000 --out ./out/bundle
# Inspect & validate
leadforge inspect ./out/bundle
leadforge validate ./out/bundle
from leadforge.api import Generator
gen = Generator.from_recipe(
"b2b_saas_procurement_v1",
seed=42,
exposure_mode="student_public",
)
bundle = gen.generate(
n_leads=5000,
difficulty="intermediate",
)
bundle.save("./out/bundle")
All tiers share the same fictional company and causal structure. Only signal strength, noise, and missingness differ.
Each tier ships 5,000 leads ยท 70 / 15 / 15 train/valid/test Parquet splits ยท 9-table relational bundle
Download the v1 dataset on HuggingFace or Kaggle, or generate your own with the Python package.