I like the idea of blending actions with speech. While it takes more work creating this, it can actually work pretty well if done correctly. The issue is the sheer amount of images you need. For instance, to indicate it is hot, you can have your characters sweat. You can have them fan themselves. You can even have them pulling at the collar of their shirt to let some fresh air in and to help pull their sweaty clothes away from sticking to their skin. Sure, these things are a minimum of 3 additional images for each character, but it adds more depth to the experience. When it comes to writing, it depends on the story. If I'm going to be using a lot of dialogue, I prefer writing it like this:
Narration
Speaker 1:
Speaker 2:
If it's meant to be more narration-focused, I prefer this:
Narration
Speaker 1:
Narration
Speaker 2:
Narration
Generally, I go with something more along the lines of a blend of the two. I try to use narration whenever the environment changes, dialogue whenever the characters have something to say, and visual cues whenever there is a change in the characters that can be expressed. I find that this makes the story have more depth and immersion.
I feel I should mention that there is no right way of doing things. Whatever works best for you is what is best for you. Don't try to push yourself to write in a way that doesn't feel natural to you just to fit some kind of standard. In the end, what matters is that you are happy with what you made.