Framing/Scaling Difference: /talks Idle Video vs. /talks/streams Output

Hello D-ID Support,

We are developing a real-time conversational avatar application using the Talks Streams API (/talks/streams).

To implement an idle animation, we are following the best practice outlined in your documentation:
We pre-generate a silent, looping idle video using the standard /talks endpoint (POST /talks).
We host this idle MP4 on GCS.

The Problem:
Despite using the same 512x512 source_url, driver_url, and fluent: true for both the idle video generation and the talking stream initiation, we observe a noticeable difference in the framing and scaling of the avatar between the idle video and the talking video stream.

When transitioning between the idle video and the talking video (even using object-fit: contain in CSS on both video elements within a 1:1 container), there is a visual jump where the avatar appears slightly zoomed in or out differently. This makes the transition jarring.

Our Questions:
Is this difference in framing/scaling between /talks output and /talks/streams output expected behavior, even when using identical source images and configuration (fluent, driver_url) intended for consistency?
Are there any additional parameters (perhaps within the config object for either /talks or /talks/streams) that can be used to ensure the composition, padding, or framing of the avatar is identical between the pre-generated idle video and the real-time stream?
What is the recommended best practice for achieving a visually seamless (no scaling/framing jump) transition between a pre-generated /talks idle video and a /talks/streams talking video when using custom avatar source_urls?
Thank you for your assistance.