Adventures in video decoders: the quest for VP9a
Posted: Mon May 25, 2020 7:14 pm
Hello everyone!
I'm new. Let me start off by saying I appreciate all the work Tom and his cohorts have done to make Ren'Py and foster a very helpful community.
I've begun working with this game engine, and I'm beginning to test its limits. I happen to be a 3D animator, and I'd like to use this art style to make a game. I make 3D video files, and I'm happy that Ren'Py supports video. I'm especially intrigued by Movie Sprites/Displayables, because the alpha mask system would allow me to animate characters and have them inhabit all sorts of environments by placing them against different backdrops.
I wanted to see how far I could go with this method. After doing some stress tests, I realise that I can have 720p video at 30 frames per second and it works well. When I started to stretch it to 1080p and 60fps, I ran into very low frame rates. At first I thought it was my ancient antique of a workstation - it's about 12 years old. I tried the same settings on a laptop from 2016, and fair enough, it is a laptop, but it does have a Geforce GTX 850m graphics card in it. I also tried it on my render nodes which are 30% faster in terms of CPU than my old workstation, but they only have integrated graphics. Here are my results...
3.1ghz, Quad Core Render Nodes with 4gb RAM - 4FPS
2.2ghz, Quad Core Workstation with Radeon HD770 and 8gb RAM - 15-20FPS
2.4ghz, Dual Core Laptop with Nvidia GTX 850m and 8gb RAM - 25FPS
The hardware acceleration is certainly working! Naturally, I'm not likely to get anywhere near 60FPS 1080p video on most computers. Even if a high end gaming rig could pull it off, I would really rather not limit the number of people who could enjoy my game so drastically. The thing I don't quite get is, that the 10 year old render nodes, and the 12 year old workstation, will play a 60FPS, 1080p video in a player like Media Player Classic, MPV or VLC just fine.
So I figured the bottleneck lies with the way Ren'Py uses the alpha mask. I understand that it would be quite computationally expensive to carve up a double wide video, apply the alpha mask on the right to the video on the left, then stitch it back together again, all in real time. I started playing around with different encoding techniques and stumbled upon the VP9 codec, and the YUVA420p colour space. I encoded a video with ffmpeg. ffplay, MPC, VLC, and Ren'Py seem to ignore the alpha channel at first. Then I learned that the "default" decoder (whichever that is, I'm not sure) for ffmpeg is incompatible with the format of the video, although it would run fine, just ignoring the alpha channel. I ran the same file in ffplay and MPV with arguments that instructed those programs to decode with the libvpx-vp9 codec, and hey presto - the embedded alpha channel worked! And still at a rock solid 60FPS on a 12 year old toaster.
I understand that Ren'Py uses libav to play videos. The part where I get out of my depth is when I try to figure out how I might coax that into using the libvpx-vp9 decoder instead of whatever "default" it uses. It feels like I'm only a couple of steps away from success, but I could be delusional. I dug deeper and deeper until I ended up at ffmedia.c in the module folder of Ren'Py, trying to hack it into forcefully using libvpx-vp9 as a decoder, but I'm very out of my depth and I might be in completely the wrong area. I noticed a method called "AVCodecContext", although I don't know what "context" means in the... context of this... context. Here's my sad attempt at code whispering so you can have a laugh:
Another method I was considering was finding a Python wrapper for the likes of ffmpeg that I might find flexible enough to tell it what decoder to use, then crowbar it into Ren'Py by way of custom displayables. But that seems like reinventing the wheel, does it not?
So as you can see, I can't tell whether I'm just missing something that's right under my nose, or if I'm tilting at windmills. I think I need the transparency. 60FPS would be "nice". I'm just wondering if it's possible I can have my cake and eat it. Considering how quickly the videos decode in a video player, I know I can't expect precisely as much efficiency as if it's running in a game, but surely there's some way to squeeze it in there?
Thank you so much for listening to me waffle on for so long.
I'm new. Let me start off by saying I appreciate all the work Tom and his cohorts have done to make Ren'Py and foster a very helpful community.
I've begun working with this game engine, and I'm beginning to test its limits. I happen to be a 3D animator, and I'd like to use this art style to make a game. I make 3D video files, and I'm happy that Ren'Py supports video. I'm especially intrigued by Movie Sprites/Displayables, because the alpha mask system would allow me to animate characters and have them inhabit all sorts of environments by placing them against different backdrops.
I wanted to see how far I could go with this method. After doing some stress tests, I realise that I can have 720p video at 30 frames per second and it works well. When I started to stretch it to 1080p and 60fps, I ran into very low frame rates. At first I thought it was my ancient antique of a workstation - it's about 12 years old. I tried the same settings on a laptop from 2016, and fair enough, it is a laptop, but it does have a Geforce GTX 850m graphics card in it. I also tried it on my render nodes which are 30% faster in terms of CPU than my old workstation, but they only have integrated graphics. Here are my results...
3.1ghz, Quad Core Render Nodes with 4gb RAM - 4FPS
2.2ghz, Quad Core Workstation with Radeon HD770 and 8gb RAM - 15-20FPS
2.4ghz, Dual Core Laptop with Nvidia GTX 850m and 8gb RAM - 25FPS
The hardware acceleration is certainly working! Naturally, I'm not likely to get anywhere near 60FPS 1080p video on most computers. Even if a high end gaming rig could pull it off, I would really rather not limit the number of people who could enjoy my game so drastically. The thing I don't quite get is, that the 10 year old render nodes, and the 12 year old workstation, will play a 60FPS, 1080p video in a player like Media Player Classic, MPV or VLC just fine.
So I figured the bottleneck lies with the way Ren'Py uses the alpha mask. I understand that it would be quite computationally expensive to carve up a double wide video, apply the alpha mask on the right to the video on the left, then stitch it back together again, all in real time. I started playing around with different encoding techniques and stumbled upon the VP9 codec, and the YUVA420p colour space. I encoded a video with ffmpeg. ffplay, MPC, VLC, and Ren'Py seem to ignore the alpha channel at first. Then I learned that the "default" decoder (whichever that is, I'm not sure) for ffmpeg is incompatible with the format of the video, although it would run fine, just ignoring the alpha channel. I ran the same file in ffplay and MPV with arguments that instructed those programs to decode with the libvpx-vp9 codec, and hey presto - the embedded alpha channel worked! And still at a rock solid 60FPS on a 12 year old toaster.
I understand that Ren'Py uses libav to play videos. The part where I get out of my depth is when I try to figure out how I might coax that into using the libvpx-vp9 decoder instead of whatever "default" it uses. It feels like I'm only a couple of steps away from success, but I could be delusional. I dug deeper and deeper until I ended up at ffmedia.c in the module folder of Ren'Py, trying to hack it into forcefully using libvpx-vp9 as a decoder, but I'm very out of my depth and I might be in completely the wrong area. I noticed a method called "AVCodecContext", although I don't know what "context" means in the... context of this... context. Here's my sad attempt at code whispering so you can have a laugh:
Code: Select all
static AVCodecContext *find_context(AVFormatContext *ctx, int index) {
if (index == -1) {
return NULL;
}
AVCodec *codec;
AVCodecContext *codec_ctx = NULL;
AVCodecContext *codec_ctx_orig = ctx->streams[index]->codec;
codec = avcodec_find_decoder(codec_ctx_orig->codec_id);
// My caveman attempt at hard coding.
codec = avcodec_find_decoder_by_name("libvpx-vp9");
// That's it. Really. That's the best I can do.
if (codec == NULL) {
return NULL;
}
codec_ctx = avcodec_alloc_context3(codec);
if (avcodec_copy_context(codec_ctx, codec_ctx_orig)) {
goto fail;
}
if (avcodec_open2(codec_ctx, codec, NULL)) {
goto fail;
}
return codec_ctx;
fail:
avcodec_free_context(&codec_ctx);
return NULL;
}
So as you can see, I can't tell whether I'm just missing something that's right under my nose, or if I'm tilting at windmills. I think I need the transparency. 60FPS would be "nice". I'm just wondering if it's possible I can have my cake and eat it. Considering how quickly the videos decode in a video player, I know I can't expect precisely as much efficiency as if it's running in a game, but surely there's some way to squeeze it in there?
Thank you so much for listening to me waffle on for so long.