Batching: How to Minimize Draw Calls

This website is under development – please keep in mind that the information contained here could (and probably will) change often, so don’t refer to it with absolute certainty that the content will still be here.

This tutorial explains what 2D batch rendering is, why it’s crucial for performance, and how to design a system that renders thousands of sprites with a single draw call using Metal and Metal-C++.

What Is Batch Rendering?

In any modern graphics API, including Metal, issuing draw calls isn’t free. Each draw call requires a trip from CPU to GPU, state validation, and synchronization — operations that can quickly become a bottleneck if overused.

Imagine rendering 1,000 individual sprites using 1,000 separate draw calls. Each call involves binding a vertex buffer, issuing a command, and potentially switching states like shaders or textures. This cost quickly accumulates, leading to dramatic framerate drops, especially on mobile or integrated GPUs.

Batch rendering addresses this problem by allowing you to submit many sprites (quads) in one go. Instead of calling draw(...) a thousand times, we build a large buffer with all the data and make a single draw call. This massively reduces CPU overhead and allows the GPU to work far more efficiently.

Why Batching Matters

Batching significantly improves performance by reducing how often we interrupt the GPU. Without batching, the GPU is repeatedly told to stop, rebind resources, and draw — like making a chef stop every 5 seconds to grab a new ingredient. With batching, we hand the chef everything at once.

This isn’t just theoretical: in practical terms, batch rendering can be the difference between 15 FPS and over 1,000 FPS. By consolidating all geometry into a single buffer and using one draw call, the CPU has more time for gameplay logic and the GPU processes a steady stream of data.

Here’s a simplified comparison:

Method	Draw Calls	Estimated FPS
No batching	10,000	~15 FPS
With batching	1	~1000+ FPS

How It Works

The batching process is relatively straightforward:

We write all the vertex data for all visible sprites into a large, contiguous vertex buffer.
We write all indices (for drawing quads as triangles) into a shared index buffer.
Once the buffers are filled, we bind the shared pipeline state, texture, and buffers.
Finally, we issue a single draw call with drawIndexedPrimitives(...).

This approach minimizes CPU–GPU communication, reduces render state changes, and allows the GPU to work with a large, efficient dataset.

flowchart TB
    A["Many Sprites (Quads)"] --> B["Build Vertex Buffer"]
    A --> C["Build Index Buffer"]
    B --> D["Upload to GPU"]
    C --> D
    D --> E["One Draw Call"]
    E --> F["1000+ sprites on the screen"]

Key Requirements

To ensure sprites can be batched together, they must meet certain constraints. They need to share the same pipeline state, use the same render pass, and ideally rely on the same texture (or a texture atlas). Any deviation — like switching to a different shader, or using a different blend mode — will require flushing the current batch and starting a new one.

This is why games often use texture atlases: to avoid switching textures mid-frame and breaking the batch.

Sprite Vertex Format

A 2D sprite is typically represented as a quad — two triangles composed of four vertices. When batching, we append these quads into the buffer one after another.

Each vertex usually carries:

Position in 2D or 3D space
Texture coordinates (UV)
Per-vertex color (optional)
A texture index if using a texture array or atlas

Here’s an example vertex structure in C++:

struct SpriteVertex {
    simd::float2 position;
    simd::float2 uv;
    simd::float4 color;
    float texIndex;
};

This layout allows the vertex shader to determine which part of the texture to sample, and what color modulation to apply. The texture index enables access to multiple textures from a single array, further extending batching possibilities without changing resources.

Designing Our Batch Renderer

To keep things simple and maintainable, we will build a dedicated BatchRenderer class that encapsulates all the rendering complexity — vertex and index buffer management, GPU uploads, pipeline state binding, and the draw calls.

This separation means the main application code (e.g. main.cpp) remains clean and focused purely on game logic and sprite data, without needing to manage low-level graphics resources directly.

Why a Separate BatchRenderer?

Clear responsibilities: Only the batch renderer handles GPU buffers, shaders, and drawing.
Simplified sprite objects: Sprites don’t store or manage vertices, indices, or Metal buffers. They just hold their own properties — position, size, rotation, color, and texture reference.
Future-proofing: The renderer can be extended to support new features like texture atlases, different blending modes, or advanced batching strategies without changing sprite code.
Easier debugging and maintenance: Centralized rendering code is easier to optimize and fix.

What Will Sprites Contain?

Our sprites will be plain data objects holding:

Position (simd::float2 or float3)
Size (simd::float2)
Rotation (angle in radians or degrees)
Color (simd::float4)
Texture identifier (e.g., filename or texture pointer)

They will not manage any GPU buffers, vertex formats, or draw logic.

How Will the BatchRenderer Work?

Maintain large CPU-side arrays for vertices and indices internally.
Accept sprites via a simple API like BatchRenderer::SubmitSprite(const Sprite& sprite).
Convert sprite properties into vertex data inside the renderer (applying transformations like position, scale, rotation).
Keep track of how many sprites are batched.
When full or when explicitly requested, flush the batch: upload buffers and issue the draw call.
Handle pipeline and resource state changes transparently.