It's kind of a pain, but one other option that might be worth looking into is rendering your animation frame by frame as pngs (or webp) and coding it as an Image Statement With ATL Block:
https://www.renpy.org/doc/html/atl.html ... -atl-block
You can combine this with Layered Images or Condition Switch (or both) to make it easier to do expressions and poses (or anything where only a relatively small part of the image changes). In addition to being a little more flexible, I think it usually reduces file size, and as an added bonus, it makes things easier if you ever consider a browser port (since webm currently isn't compatible with Ren'Py's HTML distribution builder).
I actually tried to do
a predominately webm-based game for NaNoRenO last year, and while it definitely works, it can get unexpectedly complicated (for example, I found it hard to time things so that an idle animation goes smoothly into a pose animation).
Depending on your goals, webm videos may still be the way to go. I would say it's a good choice for anything relatively high production value (or somewhat long) that isn't going to change a lot based on variables or player input. For something that's short and simple (i.e. lip flap or blinking) or something that's very "game-y" (affected by variables or player input), using transformation language (ATL) is probably the way to go. It might seem complicated, but after the initial learning curve it becomes way easier.