┌ ┐ └ ┘

AutoCSS: A Hackathon Retrospective

In 2018, I built a tool to convert hand-drawn wireframes to HTML using OpenCV contour detection. Eight years later, VLMs do this with a single API call. A love letter to scrappy engineering.

python · opencv · hackathon · retrospective · 9 min read

[ github ] [ demo ]

The Pitch

In 2018, at a hackathon somewhere in my early undergrad, I had what felt like a brilliant idea: what if you could sketch a website layout on paper, point a camera at it, and have working HTML pop out the other end?

XKCD 1425: Tasks — 'In the 60s, Marvin Minsky assigned a couple of undergrads to spend the summer programming a computer to use a camera to identify objects in a scene. He figured they'd have the problem solved by the end of the summer. Half a century later, we're still working on it.'

The comic’s punchline is that identifying whether a photo contains a bird requires a research team and five years. Converting hand-drawn wireframes to code? That was our bird.

But we were undergrads with 48 hours, a laptop, and the unwavering confidence that comes from not knowing what you don’t know.

So we built it anyway.

How It Works

Here’s what we were trying to do:

┌ ┐ └ ┘

  ┌─────────────┐         ┌─────────────┐
  │  ┌───────┐  │         │   <div>     │
  │  │ ███   │  │         │     <div>   │
  │  ├───┬───┤  │   ───▶  │       ...   │
  │  │ █ │ █ │  │         │     </div>  │
  │  └───┴───┘  │         │   </div>    │
  └─────────────┘         └─────────────┘
    hand-drawn               HTML output

The Dream

The key insight: OpenCV already knows how to find nested shapes. We just had to convince it that rectangles-inside-rectangles is the same thing as divs-inside-divs.

The Pipeline

┌ ┐ └ ┘

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  Input   │    │ Cleanup  │    │  Edges   │    │ Contours │    │  Output  │
│ ──────── │    │ ──────── │    │ ──────── │    │ ──────── │    │ ──────── │
│  photo   │───▶│threshold │───▶│  canny   │───▶│RETR_TREE │───▶│   HTML   │
│  of      │    │  + blur  │    │  edge    │    │ hierarchy│    │   DOM    │
│ drawing  │    │          │    │detection │    │          │    │   tree   │
└──────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘

Processing Pipeline

And here’s what that actually looked like:

Each frame shows a real intermediate output from the pipeline:

Input: The hand-drawn wireframe on paper
Binarized: Threshold to clean black-on-white
Edges: Canny edge detection finds the lines
Contours: OpenCV traces every closed shape (notice the double lines!)
Output: Clean bounding rectangles, ready for DOM conversion

The magic happens in step 4. Let me show you.

The RETR_TREE Trick

When you call cv2.findContours(), you can choose how to organize the results. Most tutorials use RETR_EXTERNAL (just outer shapes) or RETR_LIST (flat list).

We used RETR_TREE.

This returns the full parent-child hierarchy of every contour. And here’s the thing: a contour hierarchy is already a DOM tree.

┌ ┐ └ ┘

What OpenCV returns:              What HTML needs:

Contour 0 (outermost)            <div>          ← root
  ├── Contour 1                    <div>        ← child
  │     ├── Contour 3                <div/>     ← grandchild
  │     └── Contour 4                <div/>     ← grandchild
  └── Contour 2                    <div/>       ← child
                                 </div>

Same structure.                  Same structure.
Different syntax.

Contour Hierarchy = DOM Tree

The recursive function to walk this tree is tiny:

def build_dom(contour_idx, hierarchy):
    """Walk contour tree, emit nested divs."""
    node = hierarchy[0][contour_idx]
    child_idx = node[2]  # First child
    
    if child_idx == -1:
        return "<p>content</p>"  # Leaf node
    
    html = ""
    while child_idx != -1:
        html += f"<div>{build_dom(child_idx, hierarchy)}</div>"
        child_idx = hierarchy[0][child_idx][0]  # Next sibling
    
    return html

That’s it. That’s the whole DOM generation. Ten lines.

XKCD 1838: Machine Learning — People kept asking if we used machine learning. We didn't even have a pile.

The Double Contour Problem

Of course, nothing is that simple. Hand-drawn lines aren’t perfect geometric shapes. Edge detection on thick marker strokes produces… interesting results.

When Canny runs on a hand-drawn line, it finds two edges, one on each side of the ink stroke. This means every rectangle becomes two nested contours:

┌ ┐ └ ┘

What you draw:          What OpenCV sees:

    ████████                ┌──────────┐  ← outer edge of ink
    █      █                │ ┌──────┐ │  ← inner edge of ink
    █      █       ───▶     │ │      │ │
    █      █                │ │      │ │
    ████████                │ └──────┘ │
                            └──────────┘

 One rectangle            Two nested contours!

The Double Contour Problem

Without fixing this, every <div> would have a useless wrapper div inside it.

The Fix: Skip Single Children

The solution is elegant: if a contour has exactly one child, and that child is roughly the same size, skip the parent and promote the grandchildren:

┌ ┐ └ ┘

Before:                        After:

     ┌───┐                         ┌───┐
     │ 0 │                         │ 0 │
     └─┬─┘                         └─┬─┘
       │                             │
     ┌─┴─┐ ← single child,      ┌───┴────┐
     │ 1 │   same size          │        │
     └─┬─┘   (duplicate!)     ┌─┴─┐   ┌─┴─┐
       │                      │ 2 │   │ 3 │
   ┌───┴────┐                 └───┘   └───┘
   │        │
 ┌─┴─┐   ┌─┴─┐               Grandchildren promoted!
 │ 2 │   │ 3 │
 └───┘   └───┘

The dup_tree Fix

The dup_tree() function walks the tree bottom-up, pruning these ghost nodes. It’s 15 lines of Python and it turned the output from “unusable mess of nested wrappers” to “actually correct DOM structure.”

Nested boxes became nested divs. That’s not nothing.

The Twist

To understand why our approach made sense, you need to understand what the world looked like in 2018.

┌ ┐ └ ┘

┌─────────────────────────────────────────────────────────────────┐
│                    WHAT WAS HOT IN 2018                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  GANs ████████████████████████████  CycleGAN, pix2pix          │
│  CNNs ███████████████████████       "ImageNet is solved"        │
│  RNNs ██████████████                Seq2seq, attention          │
│  RL   █████████                     AlphaGo hype                │
│                                                                 │
│  Transformers ██                    (2017 paper, still ignored) │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The 2018 AI Landscape

The transformer paper, “Attention Is All You Need,” had been published the year before, on June 12, 2017. By our 2018 hackathon, nobody around us thought it was about to rewrite this whole problem class.

At a hackathon in 2018, you used OpenCV:

Ran on any laptop (no GPU required)
Excellent Python bindings
Thousands of tutorials
Actually worked in 48 hours

Deep learning meant training CNNs for days on a GPU cluster. Not hackathon-friendly.

What Others Were Building

We weren’t alone in trying sketch-to-code:

┌ ┐ └ ┘

Project	Approach	Result
Pix2Code (2017)	CNN + LSTM	77% accuracy on 16 synthetic tokens
Airbnb (2017)	Component classifier	Only worked with their 150 pre-defined components
Microsoft Sketch2Code (2018)	Azure CV + rules	Supported 5 element types

Other Sketch-to-Code Projects

Everyone had the same idea. Nobody cracked it.

XKCD 1319: Automation — We spent 48 hours automating something that takes 30 minutes by hand.

Then, March 2023

The GPT-4 Developer Livestream. Greg Brockman photographs a hand-drawn sketch on a napkin. Seconds later: a working website.

The exact project we built in 2018. But working perfectly, instantly, with no edge detection, no RETR_TREE, no “draw with thick marker in good lighting.”

The Difference

┌ ┐ └ ┘

AutoCSS (2018):                    GPT-4V (2024):

┌──────────┐                       ┌──────────┐
│ pixels   │                       │ pixels   │
└────┬─────┘                       └────┬─────┘
     │                                  │
     ▼                                  │
┌──────────┐                            │
│ edges    │                            │
└────┬─────┘                            │
     │                                  │
     ▼                                  ▼
┌──────────┐                       ┌──────────┐
│ contours │                       │"I see a  │
└────┬─────┘                       │ search   │
     │                             │ bar and  │
     ▼                             │ a nav    │
┌──────────┐                       │ menu..." │
│hierarchy │                       └────┬─────┘
└────┬─────┘                            │
     │                                  ▼
     ▼                             ┌──────────┐
┌──────────┐                       │ working  │
│nested    │                       │ React +  │
│divs      │                       │ Tailwind │
└──────────┘                       └──────────┘

Geometry                           Understanding

2018 vs 2024

Where we saw “rectangle at (100,200) with width 300,” a VLM sees “search bar in header, probably e-commerce, needs placeholder text.”

The Stanford Design2Code benchmark (2024): GPT-4V generated pages that could replace originals 49% of the time on 484 real websites.

Not 77% on 16 synthetic tokens. Near-majority replacement on arbitrary sites.

The Modern Toolkit

tldraw Make Real: Draw → HTML/Tailwind. ~$0.04 per generation.
screenshot-to-code: 71K+ GitHub stars. Any mockup → React/Vue/HTML.
v0.dev: 6M+ developers. Production components from sketches.

One API call. No contour detection required.

Coda

XKCD 2138: Wanna See the Code? — Opening the repo after 7 years felt exactly like this.

I recently opened the AutoCSS repository for the first time in eight years.

The code is exactly what you’d expect:

Jupyter notebooks as the main codebase
Variables named heirarchy (typo preserved for posterity)
Commented-out print statements everywhere
Hardcoded paths to lay6.png

It’s a mess. It’s also a time capsule.

What Hackathons Are For

Hackathons aren’t about production code. They’re about asking “what if?”

What if contour hierarchies mapped to DOM trees? They do. What if you could draw a website and have it appear? You can. What if the weird idea could work? Sometimes.

The Right Idea, Wrong Time

The core insight (visual layouts should convert directly to code) was right. The industry spent eight years proving it.

We were just too early, with tools too primitive.

There’s something satisfying about that. Building something that anticipated where the field was going, even if we couldn’t get there ourselves.

The Code Is Still There

The repository is public. The contours still detect on clean drawings.

It’s not useful anymore. But it’s mine.

And somewhere in those nested loops and misspelled variables is proof that I once spent 48 hours turning hand-drawn boxes into HTML because I thought it would be cool.

It was.