Advertisement

Roll wants to recreate dolly shots and more using generative AI

Those familiar with Faizan Buzdar, who was until recently the VP of product management at Box, likely associate the entrepreneur with Convo, the digital workspace platform popular among newsrooms (including this one). But Buzdar, whose background is in electronics engineering, has long held a fascination with video and visual effects.

"A lifelong video and photography enthusiast, I'd been making videos on my own for years but noticed video production had largely remained manual with little innovation in recent decades, especially for time-consuming tasks like video editing," Buzdar told me via email. "Meanwhile, I noticed that iPhone camera and sensor technology had step-function improvements over the past few years, becoming almost equivalent in image quality to DSLRs."

So while at Box, Buzdar says that he decided to try combining video -- an increasingly popular medium -- with innovations in AI and machine learning to attempt to improve the video capturing and editing experience. Buzdar tapped Adeel Abbas, a video engineer who while at Twitter contributed to the infrastructure powering the site's livestreaming features, alongside Saj Khan, Fahad Yaqub and fellow Box exec Michelle Oh to explore the frontiers of tech-accelerated video production. 

Roll is the result. A new app for iOS, it delivers bokeh, multicam shots, motion graphics and -- perhaps most intriguing to me -- "AI-simulated" sliders, dollies and jibs.

Roll
Roll

Image Credits: Roll

"Our mission is to disrupt the world of high-quality video production, and become the new standard for video content creation," Buzdar continued. "Creating great video is a massive upfront investment in gear, equipment, learning how to use that gear, software for editing -- we’re getting rid of all of that."

Roll, which is aimed at the "prosumer" market (think influencers and podcasters, but also businesses creating their own marketing material), consists of two products: the Roll iPhone app and web app. The iPhone app captures and records video and then automatically uploads it to Roll's cloud for storage and processing. The web app, meanwhile, is where footage can be previewed, accessed, shared, downloaded and edited by one or a team of content creators.

Of course, video apps are a dime a dozen. So what makes Roll different? For one, the app's aimed at use cases that most camera apps aren't, Buzdar says -- like remote video interviews, video podcasts and customer testimonials. While Zoom, Microsoft Teams and Google Meet fill the need to some degree, Buzdar argues that they're not designed for "high-quality" video production.

Roll also employs a number of real-time effects to (ostensibly) offer a greater range of post-production choices than most video-capturing apps. For example, Roll records in the HEVC standard, delivering roughly twice the bitrate and higher image quality for the same file size. And Roll can record and process up to two camera shots -- a wide-angle shot and close-up shot -- at once, allowing users to create videos with effectively "multi-camera" perspectives.

Roll
Roll

The Roll editing interface. Image Credits: Roll

Granted, multicam isn't particularly unique -- roll's far from the first app to offer it. But Buzdar says that where the magic lies is in the post-processing. Roll leverages generative AI to recreate rooms in 3D space so that content creators can move a video-game-like virtual camera around, simulating movements like panning from side to side with a dolly or crane. 

"Today, generative AI is too often associated with creating fake content out of thin air," Buzdar said. "That's not our philosophy. We don't generate fake pixels, people or scenes. We're using generative AI purely as a tool for productivity -- we want to democratize access to higher quality video production."

Buzdar explained that Roll's AI was trained to understand the 3D depth in a scene, using data to measure depth and shapes independent of the person sitting in the room. Roll started training its algorithms with open source datasets commonly used for benchmarking in academia, but then internally recorded over 22,000 video calls, creating its own rich database.

The results aren't half bad -- at least in the demo footage that Buzdar showed me. Some of Roll's AI-generated pans broach the uncanny valley, the result of unnatural warping on objects in the background as the virtual camera swivels by. But in short scenes, the AI effects are convincing enough -- and an eye-catching addition to what'd otherwise be a dull remote interview. 

"We've researched this quite a bit, and we've not seen anyone use AI in the same way we are -- pairing iPhone sensor data with large AI models in the cloud," Buzdar said. "Our technology provides foundational capabilities to simulate any visual effects a user would want."

Roll
Roll

Call recording with Roll. Image Credits: Roll

Any visual effects sounds like a bit of a stretch. But Roll has other, more realistic algorithmic tricks up its sleeve. As Roll records video, it gathers metadata for use later in the video production process, including recording and lighting conditions, the distance from the camera to the subject and the position of the subject's face and body. The metadata's used to automatically adjust the cameras and sensors on the phone as well as provide feedback and instruction for composition and lighting.

Similar to a few other "AI-enabled" mobile video editors on the market, Roll also taps the metadata to create a fully realized, multicamera reel in its editing cloud -- no manual editing required. (Users can still change and adjust the camera angles or add camera movements and visual effects if they choose.) In the near future, Roll will be able to publish directly to social media, including TikTok, YouTube and Instagram -- in both the appropriate resolution and aspect ratio.

"Today, video production requires many pieces of hardware and software to fully complete," Buzdar said. "With every single step, when the video and audio file hops from one software to another, it loses context and just becomes a 'dumb' file that is passed around. We have fundamentally rebuilt the entire video production 'stack' from scratch. Cutting across traditional software boundaries, we’ve applied AI to deliver a transformative capture-to-publish experience that vertically integrates and automates the entire remote video production workflow."

So how does Roll plan to make money? The company's so far raised cash from traditional VC sources -- Buzdar wouldn't say where, exactly. But in terms of revenue generation, Buzdar hopes Roll will eventually grow to serve the needs of corporate organizations -- specifically their in-house corporate marketing and video teams, who will pay some sort of fee for Roll's services. 

“Video production is ripe for disruption from the cloud," Buzdar said. "Attributes like large file sizes, complex processing and the need for multiperson edits and review cycles make it the perfect candidate to have exponential benefits from cloud computing like scalable storage, AI, compute, and real-time sharing and collaboration."

There's surely truth to that. As for whether Roll will be the disruptor, time will tell.