Services

Backend Engineering, Product

Industry

B2B Marketplace

Year

2022-2024

Feature flags that let the data decide.

A B2B marketplace needed to test UI variants, backend behavior, and entire features without deploying separate codepaths or buying a third-party experimentation platform. The existing stack was a Django monolith. The solution: django-waffle flags with percentage-based rollout, visitor-level persistence, and a custom admin interface that lets the product team toggle experiments in seconds.

01.
THE CHALLENGE

Testing Without a Testing Platform

The platform had no experimentation infrastructure. Every change was a full deploy. Rollbacks required another deploy. There was no way to show feature X to 20% of visitors and measure the difference. The product team wanted to A/B test contact form layouts, pricing displays, and onboarding flows. Building a full experimentation platform was out of scope. The solution had to work inside the existing Django stack with zero additional infrastructure.

Every deploy was an all-or-nothing bet. Feature flags turned deploys into experiments with a rollback button.

02.
THE SOLUTION

Waffle Flags with Visitor Persistence

django-waffle provides feature flags that can be toggled per-user, per-group, or as a percentage of all requests. The system ties flag evaluation to the visitor tracking middleware: each visitor gets a deterministic hash, and the flag's percentage threshold decides whether they see variant A or B. Once assigned, the visitor sees the same variant for their entire session. The flag state is injected into every template via a context processor, enabling template-level conditionals with zero Python changes. The same flags control backend behavior through Python-level checks.

The custom FlagAdmin with inline toggles:

Python
class FlagAdmin(WaffleFlagAdmin):
    list_display = [
        'name',
        'everyone',
        'percent',
        'superusers',
        'staff',
        'authenticated',
        'note',
        'created',
        'modified',
    ]
    list_editable = [
        'everyone',
        'percent',
        'superusers',
        'staff',
        'authenticated',
        'note',
    ]
    list_filter = ['everyone', 'superusers', 'staff']

Watch the Rollout

A live flag simulator. Toggle the flag, adjust the rollout percentage, and watch visitors get assigned to variants in real time.

show_new_contact_form
50%
0Visitors
0Variant A
0Variant B
0%Actual B%

Template-level flag conditionals:

HTML
{# base template — contact form A/B test #}

{% load waffle_tags %}

{% flag "show_new_contact_form" %}
  {# Variant B: simplified form #}
  {% include "contact/_form_v2.html" %}
{% else %}
  {# Variant A: original form #}
  {% include "contact/_form_v1.html" %}
{% endflag %}

Production Patterns

Graduated Rollout Strategy

New features start at 0% and increase in stages: 5% for internal QA, 10% for early signal, 50% for statistical significance, 100% for full launch. Each stage runs for a defined period with monitoring. If conversion drops or error rates spike, the flag rolls back to 0% instantly - no deploy needed.

Flag Hygiene and Cleanup

Every flag has a corresponding ticket with an expiration date. After a test concludes, the winning variant becomes the default code path and the flag is removed. Dead flags are tracked via commented-out model fields that serve as a historical record of past experiments. This prevents flag accumulation and keeps the codebase readable.

Backend and Frontend Consistency

The same waffle flag controls both the template output and the Python logic. If variant B shows a different contact form layout, the backend endpoint that processes the form also checks the flag to apply variant-specific validation rules. This prevents mismatches where the frontend shows one variant but the backend expects another.

03.
THE RESULT

Data-Driven Product Decisions

The flag system ran 14 A/B tests over 18 months. The new contact form variant increased conversion by 23%. A simplified pricing display reduced bounce rate by 11%. Three proposed features were killed before full development because early flag data showed no user interest. Average time from idea to live experiment: 2 hours. Average rollback time: 30 seconds. Zero experimentation-related incidents.

KEY METRICS

0A/B Tests Run
+0%Best Conversion Lift
0sRollback Time
WHAT THE CLIENT SAYS

"We used to argue about features for weeks. Now we ship both versions, let 500 visitors decide, and move on. Three features we were sure about turned out to be worthless. The data saved us months of wasted development."

Product Manager

B2B Marketplace · Product Team

FAQ

Why django-waffle instead of LaunchDarkly or similar?

How are experiment results measured?

What prevents flag conflicts?

TECHNOLOGY STACK