Building AI Guardrails: How to Prevent Your Marketing AI from Going Rogue

In partnership with

EDITOR’S NOTE

Hey there 👋

Correct me if I’m wrong, but we’ve all had that feeling right after you hit “send” on an important email. That little jolt of panic where you think, “Wait, did I spell the client’s name right? Did I attach the right file?”

Now imagine that feeling, but for an AI that’s about to send 10,000 emails on your behalf, or schedule a month’s worth of social media posts.

We’ve given AI the keys to the car, but we haven’t always buckled its seatbelt, and we’re starting to see the consequences.

However, this is entirely preventable!

This issue is about building the guardrails, the un-sexy but absolutely critical part of making AI work for you, not against you. We’re talking about how to set up the systems that keep your AI on-brand, on-message, and out of trouble.

Let’s get into it. This is the stuff that lets you sleep at night. 😴

TL;DR 📋

AI guardrails are a governance system. They ensure your AI reflects your brand standards, policies, and values before anything goes live at scale.
There are three core guardrails: Content Moderation (automated risk filters), Brand Safety Protocols (your rulebook for tone, facts, and boundaries), and Approval Workflows (a human signs off before publishing).
Designing handoff points is the most critical step. You must clearly define when the AI generates, when it pauses, and when a human reviews or overrides.
The goal is controlled amplification. The best AI setups make humans faster and sharper by handling repetitive work, while humans retain judgment, accountability, and final authority.

NEWS YOU CAN USE 📰

What are AI guardrails? AI guardrails help ensure that an organization’s AI tools and their application in the business reflect the organization’s standards, policies, and values. [Source: McKinsey & Company]

Brand Safety & AI Ads: How Ethical Failures Kill Performance. AI-based ad placements, like Meta's Advantage+, can inadvertently associate brands with unsafe or misleading content when algorithmic and data oversight are lacking. Maintaining ethical standards in content sourcing and targeting protects brand reputation and campaign effectiveness. [Source: Busy Seed]

Anthropic: Claude faces ‘industrial-scale’ AI model distillation. Anthropic has detailed three “industrial-scale” AI model distillation campaigns by overseas labs designed to extract abilities from Claude. These competitors generated over 16 million exchanges using approximately 24,000 deceptive accounts. Their goal was to acquire proprietary logic to improve their competing platforms. [Source: AINews]

DPD’s Chatbot Goes Rogue, Writes a Poem About How Useless It Is. A customer, frustrated with a DPD delivery chatbot, managed to get it to swear, criticize the company, and even write a haiku about how bad it was at its job. [Source: Daily Mail]

Meet America’s Newest $1B Unicorn

It just surpassed a $1B valuation, joining private US companies like SpaceX and OpenAI. Unlike those companies, you can invest in EnergyX today. Industry giants like General Motors and POSCO already have. Why? EnergyX’s tech can recover 3X more lithium than traditional methods. Now, they’re preparing 100,000+ acres of lithium-rich Chilean land for commercial production. Buy private EnergyX shares alongside 40k+ people at $11/share through 2/26.

Invest in EnergyX Today

_{This is a paid advertisement for EnergyX Regulation A offering. Please read the offering circular at}_{invest.energyx.com}_{. Under Regulation A, a company may change its share price by up to 20% without requalifying the offering with the Securities and Exchange Commission.}

A FEW REAL-WORLD CAUTIONARY TALES 😬

Theory is great, but seeing where things have gone wrong is even better. Let's look at a few recent high-profile AI fails and break down exactly which guardrails would have saved the day.

The Willy Wonka Experience: When Hype Meets Reality (and Loses)

What Happened: An event in Glasgow called the "Willy Wonka Experience" used whimsical AI-generated images to sell tickets. The ads promised a magical world of candy and wonder, while the reality was a nearly empty warehouse with a sad-looking Oompa Loompa and a few plastic props that left parents furious, kids crying, and the world condemning this experience.

How Guardrails Would Have Helped:

Brand safety protocol: A clear rule stating "Promotional materials must accurately reflect the real-world experience" would have been the first line of defense. The AI generated a fantasy, not a depiction of the actual event.
Approval workflow: A human-in-the-loop workflow would have forced someone to look at the AI-generated images and ask a simple question on whether they can build what they were advertising. This single approval step would have brought the entire campaign to a halt or curated it for a better experience.

Google's Gemini: The Dangers of Unchecked Bias

What Happened: In February 2024, Google apologized and temporarily halted its AI tool, Gemini, from generating images of people. The model produced historically inaccurate and, in some cases, offensive depictions. The AI inserted racial diversity into historical scenarios where it was not accurate. Examples include portraying 1943 German Nazi soldiers as people of color, creating female Popes, and, in some instances, refusing to generate images of white people. While the intent might have been to promote diversity, the execution erased history and created a PR nightmare.

How Guardrails Would Have Helped:

Content moderation: An automated system could have been trained to flag outputs that significantly deviate from well-established historical facts, although complex, it’s not impossible.
Brand safety protocol: This is the big one. Google's brand is built on accuracy and trust. A protocol defining how to handle sensitive historical and cultural topics would have been essential, including rules such as "Do not alter the known historical demographics of specific events or groups."
Approval workflow (Internal): Before releasing a tool this powerful, a rigorous internal review process with diverse human testers is crucial to red-teaming the product and identifying these kinds of issues before they reach the public.

The DPD Chatbot That Hated Its Job

What Happened: A customer, frustrated with DPD's chatbot, decided to see how far he could push it. After a bit of creative prompting, he got the chatbot to swear, call DPD "the worst delivery firm in the world," and even compose a haiku about its own uselessness.

How Guardrails Would Have Helped:

Content moderation: This is a classic failure of the "automated bouncer." A content moderation filter should have immediately blocked the chatbot from using profanity or making disparaging remarks about the company. With this being the most basic level of guardrail, it failed.
Brand safety protocol: The chatbot clearly wasn't trained on a proper brand voice. A protocol would have defined its personality (helpful, professional) and, more importantly, its boundaries. It would have included a rule like, "Never express negative opinions about the company or its services."
Approval workflow (for Training): The prompts used to train and fine-tune the chatbot should have gone through an approval process to ensure they covered a wide range of potential user behaviors, including adversarial attacks like this one.

WHY YOUR AI NEEDS A BABYSITTER 👶

We’re excited about the possibilities that AI will bring, and we’re under pressure to show results. But in the rush to automate, we’re skipping a crucial step of building the bumpers.

Think of it like this: you wouldn’t let a new intern post directly to your company’s social media accounts without any training or oversight. You’d give them brand guidelines, show them examples, and probably review their first few posts.

An AI without guardrails is like a super-enthusiastic but slightly unhinged intern. It’s full of ideas, works 24/7, and has a ton of potential. But it also lacks common sense, nuance, and a tendency to take things a little too literally, which is how you end up with a chatbot that writes poetry about its own incompetence.

The Three Guardrails That Actually Matter

Building effective guardrails is about implementing a three-layered defense system. Here’s what that looks like.

1. Content Moderation: The Automated Bouncer

This is your first line of defense.

A content moderation system automatically scans AI-generated text and images for obvious problems before they ever see the light of day.

What it looks for:

Offensive content: Swearing, hate speech, and other inappropriate language.
PII (Personally Identifiable Information): Things like phone numbers, email addresses, and credit card numbers.
Harmful topics: Content related to violence, self-harm, or other sensitive subjects.
Basic brand violations: Using a competitor’s name, for example.

Tools like Azure Content Moderator or even simple keyword filters can handle this. The goal here is to catch the low-hanging fruit automatically so you don’t have to.

2. Brand Safety Protocols: The Rulebook

This is where you teach the AI what it means to be your brand.

A content moderation system knows not to swear, and a brand safety protocol knows not to use the word “synergy” because you banned it in 2024.

Your brand safety protocol should be a clear, simple document that defines:

Your brand voice and tone: Whether you’re formal or casual, funny or serious. A list of “words we use” and “words we don’t use” is incredibly helpful here.
Your visual style: When generating images, establish clear rules regarding the use of photos or illustrations, select a cohesive color palette, and identify visual clichés to avoid.
Your “No-Go” zones: This will be topics to avoid discussing, whether that's politics, religion, or your competitor’s latest funding round.
Your factual guardrails: It is important to ensure that claims and statistics are always accurate. For example, explicitly stating, "We always say we have over 10,000 customers," rather than using vague terms like "thousands".

This is a living document that you’ll update as your brand evolves.

3. Approval Workflows: The Human in the Loop

This is the most important guardrail.

No matter how good your AI is, you need a human to have the final say on what gets published.

An approval workflow doesn’t have to be complicated. It can be as simple as having your AI drop its generated content into a Google Doc for you to review, or you can set up a simple workflow in a tool like Make or Zapier that sends a Slack message with “Approve” and “Reject” buttons.

The key is to make it easy and fast, and if your approval process is a pain, people will skip it. The aim is to create a quick, final check that catches what automated systems miss, like context, and whether a piece of content just feels right.

THIS WEEK’S PROMPT 🧠

Use this prompt with your preferred LLM to create a first draft of your brand’s AI usage policy.

The Scenario: You’re a marketing leader who needs to create a simple, clear AI policy for your team. You want to empower them to use AI, but you also need to ensure they do so safely and responsibly.

The Prompt:

"You are an AI Strategy Consultant. I need to create a one-page ‘AI Rules of the Road’ document for my marketing team. It needs to be clear, concise, and easy to understand. It shouldn’t be a scary legal document; it should feel like a helpful guide."

Key Sections to Include:

Our Philosophy on AI: A short, optimistic paragraph about why we’re using AI (e.g., “We believe AI can help us be more creative and effective, but it’s a tool to assist us, not replace us.”)
The Golden Rule: You Are Responsible. A clear statement that the person using the AI is ultimately responsible for the output. The AI is a tool; you are the craftsman.
Our ‘No-Go’ Zones: A list of things we never use AI for. (e.g., “We never use AI to generate content about our competitors,” or “We never input confidential customer data into a public AI tool.”)
The Human-in-the-Loop Rule: A simple explanation of our approval process. (e.g., “All AI-generated content that will be seen by customers must be reviewed and approved by at least one other human before it goes live.”)
Our Approved Tools: A short list of the AI tools that have been vetted and approved for use.

For each section, write a short, simple paragraph that a busy marketer can read and understand in 30 seconds.

TOOLS WE USE ⚒️

These are the most popular AI tools we use at Rise Up Media. If you're not using them already, they're worth a look.

Claude Cowork: Claude Code but for non-devs (like us!)
Manus AI: General-purpose AI agent we love (and use to create this newsletter)
n8n: Open-source automation (if you like that sort of thing)
Relevance AI: No-code create-your-own AI agents platform
OpusClip: Auto-clips long videos into shorts (and is really good at it)

Full disclosure: some links above are affiliate links. If you sign up, we’ll earn a small commission at no extra cost to you.

HAVING FUN WITH AI 😊

Everyone’s out there launching their own version of OpenClaw. First Anthropic, now Perplexity. The memes on X have been fun this week. 😁

— # (#)

WRAPPING UP 🌯

Building AI guardrails isn’t the most glamorous part of working with AI. It’s not as exciting as generating a perfect image or writing a brilliant piece of copy in 10 seconds.

But it’s the foundation that allows you to do all the cool stuff safely and at scale, turning AI from a risky experiment into a reliable, strategic advantage.

So take the time to build the bumpers, set the rules, and always put a human in the loop.

Until next time, keep prompting the horizon. 🌅

Alex Lielacher

P.S. If you want your brand to gain more search visibility in Google AI Mode, ChatGPT, and Perplexity, reach out to my agency, Rise Up Media. We can help you with that!

Building AI Guardrails: How to Prevent Your Marketing AI from Going Rogue

EDITOR’S NOTE

TL;DR 📋

NEWS YOU CAN USE 📰

Meet America’s Newest $1B Unicorn

A FEW REAL-WORLD CAUTIONARY TALES 😬

The Willy Wonka Experience: When Hype Meets Reality (and Loses)

Google's Gemini: The Dangers of Unchecked Bias

The DPD Chatbot That Hated Its Job

WHY YOUR AI NEEDS A BABYSITTER 👶

The Three Guardrails That Actually Matter

1. Content Moderation: The Automated Bouncer

2. Brand Safety Protocols: The Rulebook

3. Approval Workflows: The Human in the Loop

THIS WEEK’S PROMPT 🧠

TOOLS WE USE ⚒️

HAVING FUN WITH AI 😊

WRAPPING UP 🌯

Keep Reading

Prompt Horizon

Home