// GLOSSARY · 4D SPLAT FORMAT

4D splat format

A volumetric video format that stores a moving scene as a cloud of time-aware Gaussian "splats" — letting a viewer move the camera freely through recorded motion instead of watching a fixed 2D frame.

Last verified · 2026-07-04 · by Moe Ameen

What it is

A 4D splat format stores a dynamic scene not as a flat grid of pixels but as a cloud of soft, translucent 3D blobs — "splats" — each carrying a position, an ellipsoidal shape, a color, an opacity, and, in the 4D case, how all of those change over time. "3D" gives you geometry you can orbit around; the fourth dimension is time, so the whole scene moves. Rendering it back out is called splatting: the engine projects every blob to the screen and alpha-composites them in depth order, which is why it runs in real time on a GPU. The technical name for the dominant approach is 4D Gaussian splatting (4DGS), an extension of the 3D Gaussian splatting method that broke out in 2023.

The practical difference from a normal MP4 is that an MP4 is a recording from one camera position — the director already chose the shot. A 4D splat scene stores the geometry and appearance of the whole space, so at playback time the viewer (or a VFX artist, or a game engine) chooses the camera: orbit the subject, dolly in, freeze time and walk around a frozen moment, or re-light the scene. That "free-viewpoint" property is what makes people call it volumetric video or holographic video rather than just video.

Under the hood, each 4D Gaussian is a primitive with a 4-dimensional mean and a 4×4 covariance — an anisotropic ellipsoid that can stretch and rotate through space and time — and its color is encoded with 4D spherindrical harmonics so it can look different from different angles and at different moments. Because the primitives are explicit and optimizable (unlike a NeRF, which bakes the scene into an implicit neural network you have to query), 4DGS renders far faster: research implementations report real-time frame rates at HD, which is the whole point of the format existing.

The honest caveat: there is no single agreed 4D splat file format yet. Static Gaussian splats are converging on standards — the Khronos glTF `KHR_gaussian_splatting` extension reached release-candidate status in early 2026, and OpenUSD and Cesium 3D Tiles support them — but the 4D (moving) case is still fragmented, with each research project and vendor using its own on-disk layout and compression scheme. Files are also large: streaming a moving volumetric capture is measured in tens to low-hundreds of megabits per second before compression, which is why most of the 2026 work is about hierarchical, progressively-streamed encodings rather than the primitive itself.

The history

The lineage starts with NeRF (Neural Radiance Fields, 2020), which proved you could reconstruct a photorealistic 3D scene from ordinary photos — but it was slow to train and slow to render because the scene lived inside a neural network. In 2023, "3D Gaussian Splatting for Real-Time Radiance Field Rendering" (Kerbl et al., SIGGRAPH 2023) replaced the network with millions of explicit Gaussian blobs and a fast rasterizer, collapsing render times from seconds-per-frame to real time. That paper is the reason "splat" entered the creator vocabulary.

The move to 4D followed almost immediately. Through 2024 a wave of methods — deformation-field approaches that warp a canonical 3D splat cloud over time, and native-4D approaches that treat time as a real fourth axis of each Gaussian — extended splatting to moving scenes. By 2026 the work had shifted from "can we do it" to "can we ship it": the research focus moved to compression and streaming (hierarchical bitstreams that decode more or less detail based on bandwidth) and the format began early commercial deployment, initially in film, VFX, and sports broadcast, with vendors like Gracia and Evercoast productizing capture-to-playback pipelines and browser/WebXR streaming.

How it behaves across platforms

Platform	Behavior
Web / WebXR	Renders in-browser with no install via WebGL/WebGPU. The delivery target most creators will actually touch — a splat scene embedded on a page or in a WebXR experience. Progressive streaming lets it start rendering before the full file downloads.
VR / AR headset	The native home for the format — free-viewpoint and depth are the whole value proposition in a headset. Playback quality is gated by the headset GPU and the scene's splat count; heavy captures get decimated for standalone headsets.
Film / VFX pipeline	Imported into DCC tools (Cinema 4D, Unreal, Blender via plugins) as a relightable, re-frameable element. Editors treat a captured performer as a 3D asset they can shoot from any angle in post rather than a locked plate.
Social feeds (Instagram, TikTok, YouTube)	No feed renders a live 4D splat scene — they play flat 2D video. To post from a splat scene you render a fixed camera path out to an MP4 first, which throws away the free-viewpoint property but gives you a normal short. This export step is where a repurposing pipeline picks it up.
Game engines (Unreal, Unity)	Splat scenes drop in as a rendering layer via engine plugins, useful for photoreal environments and captured characters. Real-time budget and collision/interaction (splats have no mesh) are the practical limits.

Concrete examples

A product studio captures a shoe rotating on a turntable as a 4D splat scene. On the site it ships as a WebXR embed the shopper can spin and inspect; for paid social, the team renders three fixed camera moves out to vertical MP4s and posts those as normal shorts.
A sports broadcaster reconstructs a goal as volumetric video so the replay can fly the camera to any angle — including impossible ones no physical camera occupied. The broadcast graphic is a rendered path through the splat scene, not a live-rendered splat.
A VFX team captures a stunt performer once as a 4D splat, then reframes the same performance for a wide, a close-up, and an over-the-shoulder in the edit — no reshoot, because the geometry is stored, not just one camera's pixels.
A creator experimenting with the format captures a walkaround of their workshop, exports a 20-second orbiting hero shot to MP4, and runs that clip through a repurposing tool to spin up captioned shorts, a carousel of stills, and a blog embed — treating the splat render as source footage, not the final post.

Common mistakes

Assuming a 4D splat file will "just play" on Instagram or TikTok. Feeds render flat 2D video; you have to bake a fixed camera path out to MP4 first, which discards the free-viewpoint property the format exists for.
Confusing 3D and 4D splats. A 3D Gaussian splat is a frozen scan — one moment you can orbit. 4D adds motion. If the subject moves, a 3D capture will smear or ghost.
Expecting a single portable file format. In 2026 there is no ratified 4D interchange standard; a scene authored in one tool may not open in another. Pin your pipeline to a specific tool chain rather than assuming portability.
Underestimating file size and bandwidth. Uncompressed moving splat captures run into tens or hundreds of megabits per second. Plan for hierarchical/progressive streaming, not a raw download, for anything over a few seconds.
Treating splats like meshes. Splats have no surface geometry, so collision, physics, and clean UV texturing do not come for free in a game engine — they are a rendering layer, not a swappable mesh.
Chasing photorealism at the wrong splat count. Detail scales with the number of Gaussians, and that directly costs GPU frame time on the viewer's device. A capture that is gorgeous on a workstation can stutter on a phone or standalone headset.

The honest take

4D splats are one of the more genuinely exciting formats to appear recently, and also one of the most misunderstood by creators, because the thing that makes it special — the viewer picks the camera — is exactly the thing that disappears the moment you post to a feed. A 4D splat scene is a source asset, not a finished post. It lives in headsets, WebXR embeds, and VFX timelines. The second it needs to appear on Instagram, TikTok, LinkedIn, or YouTube, someone renders a fixed camera move out to a flat MP4, and from that point on it is just really good footage.

That render-to-flat moment is where a tool like Kompozy is relevant, and it is worth being precise about the boundary. Kompozy does not capture or render splat scenes — that is upstream, in the capture rig and the reconstruction pipeline. What Kompozy is built to do is take the flat deliverable that comes out the other end — the orbiting hero shot, the fly-through, the volumetric replay rendered to video — and turn it into a month of distribution: clip it into [vertical shorts](/glossary/vertical-video) with burned-in captions, cut stills into a carousel, draft the launch blog and newsletter around it, and schedule the whole set across nine platforms from one source. So the useful mental model is a two-stage stack: the splat pipeline owns capture and free-viewpoint rendering; the content generation-and-publishing engine owns everything after the export — drafting the blog and newsletter net-new, clipping the shorts, and scheduling the whole set. Confusing the two — expecting a feed to render volumetric video, or expecting a publishing tool to reconstruct a scene — is where most of the 2026 hype gets it wrong.

Frequently asked questions

What is a 4D splat format?

It is a volumetric video format that stores a moving scene as a cloud of time-aware Gaussian "splats" — soft 3D blobs that carry position, shape, color, opacity, and how those change over time. Because it stores geometry rather than a single camera's pixels, a viewer can move the camera freely through the recorded motion. The dominant technique is called 4D Gaussian splatting (4DGS).

How is a 4D splat different from a normal video file?

A normal MP4 records one fixed camera view — the shot is already chosen. A 4D splat scene stores the geometry and appearance of the whole space over time, so playback can orbit, dolly, freeze a moment and walk around it, or re-light the scene. That free-viewpoint property is why it is called volumetric or holographic video.

What is the difference between 3D and 4D Gaussian splatting?

3D Gaussian splatting captures a static scene — one frozen moment you can orbit around. 4D adds time as a fourth dimension, so the whole scene moves. If the subject is moving, you need 4D; a 3D capture of motion will ghost or smear.

Can I post a 4D splat scene directly to TikTok or Instagram?

No. Social feeds render flat 2D video, not live volumetric scenes. To post from a splat scene you render a fixed camera path out to a normal MP4 first, which gives up the free-viewpoint feature but produces a standard short you can distribute anywhere.

Is there a standard 4D splat file format yet?

Not for the moving case. Static Gaussian splats are converging on standards — the Khronos glTF KHR_gaussian_splatting extension reached release-candidate status in early 2026 — but 4D formats remain fragmented across research projects and vendors, each with its own on-disk layout and compression.

How does 4D splatting relate to NeRF?

NeRF (2020) reconstructs a scene into an implicit neural network, which is slow to render. Gaussian splatting (2023) stores the scene as explicit, optimizable blobs and rasterizes them, which renders in real time. 4D splatting extends that explicit approach to moving scenes, keeping the real-time speed advantage over NeRF-style methods.

Where does a tool like Kompozy fit with volumetric video?

Kompozy does not capture or render splat scenes — that is the capture rig and reconstruction pipeline's job. Kompozy comes after the export: it takes the flat MP4 rendered from a splat scene and turns it into clipped vertical shorts, carousels, a blog, and a newsletter, then schedules them across nine platforms from that one source.

Related terms

Avatar video — AI-generated talking-head video where a digital avatar speaks a written script using voice cloning or synthetic voice.
Short-form video — Vertical or square video typically under 60–90 seconds, optimized for feed scrolling and algorithmic discovery on Reels, Shorts, and TikTok.
Vertical video — Video shot or cropped at 9:16 aspect ratio, optimized for phone-held viewing on Reels, Shorts, TikTok, and Stories.

Related deep guides

AI Content Repurposing — The complete methodology for turning one source into 25-35 pieces of native-format content across every platform — without producing AI slop.
Autonomous Content Creation — Most "autonomous" AI content is slop.
AI Brand Voice & Persona — Without a Persona Brief, every AI output averages to the LLM default voice.

← All terms · Get started →