RL Environments · Secure Code

AI writes insecure code.
We're fixing that.

AI models are trained on static text — great at syntax, terrible at security. Muence builds reinforcement learning environments that teach models to write secure, robust code.

See what we build Talk to us

61%of AI-generated code is functional

10.5%of AI-generated code is secure

129+security challenges benchmarked

The Problem

Models have never been penalized for writing insecure code.

Current training pipelines reward functional correctness — does the code run? Security logic is never part of the reward signal. So models learn to write code that works, but leaves SQL injection, broken access control, and hardcoded secrets behind by default.

The fix isn't more data. It's the right environment — one that gives models a penalty signal when they ship insecure code.

Broken Access Control78% failure rate

SQL / Command Injection65% failure rate

Hardcoded Secrets54% failure rate

Missing Input Validation71% failure rate

IDOR60% failure rate

Across 129 AI-generated code samples evaluated on Muence

What We Build

Our Offerings

Two interlocking products — the data that feeds the environments, and the environments themselves.

Preference & Eval Data

Comprehensive security preference datasets for reinforcement learning and model evaluation. Human-labeled audit comparisons across real-world vulnerability patterns, exported as DPO-ready training pairs.

Human preference labels on AI security audits
DPO-ready JSONL export
CWE-mapped vulnerability taxonomy
Scalable data gathering pipeline

Get early access

Custom RL Environments

Sandboxed code execution environments with automated security test suites and a scoring function that gives AI models a reward signal for secure code and a penalty for insecure code — plug directly into post-training pipelines.

Real-world code repos, fully containerized
Automated security test suites
Reward / penalty scoring function
Plug-and-play with post-training pipelines

Learn more

The Pipeline

How we generate training signal

Generate

AI models generate code from real-world prompts — login systems, file uploads, payment flows — without any security guidance.

Audit

Multiple models independently audit the same code for security vulnerabilities. Results are compared side-by-side.

Label & Train

Human reviewers vote on audit quality. Preference pairs become DPO training data and RL reward signals.

Talk to the team

Interested in the research, a data partnership, or running your model through our benchmark?

Jenish Kothari

jenishk@muence.com

Suprav Khanra

supravk@muence.com

AI writes insecure code.We're fixing that.