Discussions

Ask a Question
Back to All

Streaming Error / Lips aren't moving

Hi - I purchased 60 credits and am debugging...and am down to 53 with no success.

I'm not sure what's happening but I get a "video" and audio stream but the video seems to be a static image with closed lips. I can hear the audio though.

WebRTC connections seem okay...although eventually the oneIceCandidate event triggers and it starts complaining of "too many requests". I don't think I'm doing that on my end, WebRTC seems to identify candidates on its own. Any tips on handling this? The connection and signaling state are still connected and stable.

I'm also struggling with the idle.mp4 convention. The source_url images are uploaded by end-users, so I guess I need to dynamically generate an idle.mp4 for each AI also?

I'm building a platform of AI Models/Personas and one of the features gives clients access to create their own AI avatar and configure their own persona. I have a client who has a Ph.D. in social work and was awarded a research grant and she wants to configure AI's to simulate social work interviews for her undergrad students. (instead of interviewing real human test subjects) I think if successful I would be a long-term d-id customer and bake in other interview simulation use cases.

I would really love to get this working and help support each other grow. Let me know if I can provide more details.

Maybe I need to adjust the script parameters? Below is a testing function

  def quick_talk(voice_id, text='', stream_id, session_id, img_url)
    body = {
      session_id: session_id,
      config: {
        stitch: true,
      },
      script: {
        type: "text",
        subtitles: "false",
        provider: {
          type: "microsoft",
          voice_id: 'en-US-JennyNeural'
        },
        input: text
      }
    }.to_json
    url = "https://api.d-id.com/talks/streams/#{stream_id}"
    HTTParty.post(url, headers: HEADERS, body: body)
  end

Here's the general streaming info in my database:

#<Stream:0x0000000108607468
 id: 45,
 did_stream_id: "strm_I0rEb8AP4l27_GH-WSq5R",
 did_session_id:
  "AWSALB=d3KG5dYM55uHGnlrK8TXWuoOHZnqDvoEAbH7BcI8KjnrRy6Hqc6ZtC6prQ1+jV6KJ4JcqI1nKMBnQD3tv/iFhNtR2SCMfVZzOGekRLhgPbnn3m6X8Vv4+6TtLDGM; Expires=Thu, 09 Nov 2023 12:16:42 GMT; Path=/; AWSALBCORS=d3KG5dYM55uHGnlrK8TXWuoOHZnqDvoEAbH7BcI8KjnrRy6Hqc6ZtC6prQ1+jV6KJ4JcqI1nKMBnQD3tv/iFhNtR2SCMfVZzOGekRLhgPbnn3m6X8Vv4+6TtLDGM; Expires=Thu, 09 Nov 2023 12:16:42 GMT; Path=/; SameSite=None; Secure",
 offer:
  "{\"type\"=>\"offer\", \"sdp\"=>\"v=0\\r\\no=- 1698927402664476 1 IN IP4 34.220.76.202\\r\\ns=Mountpoint 8934600989486917\\r\\nt=0 0\\r\\na=group:BUNDLE a v d\\r\\na=ice-options:trickle\\r\\na=fingerprint:sha-256 4B:4C:F0:95:42:8B:F1:2D:6C:EC:7D:C7:F6:89:58:D7:DD:39:3F:18:52:1D:D3:4D:8C:4F:8E:4E:2B:AE:52:AB\\r\\na=extmap-allow-mixed\\r\\na=msid-semantic: WMS *\\r\\nm=audio 9 UDP/TLS/RTP/SAVPF 111\\r\\nc=IN IP4 34.220.76.202\\r\\na=sendonly\\r\\na=mid:a\\r\\na=rtcp-mux\\r\\na=ice-ufrag:BemM\\r\\na=ice-pwd:Z4htJ5InK3kXWoFkpoTYwt\\r\\na=ice-options:trickle\\r\\na=setup:actpass\\r\\na=rtpmap:111 opus/48000/2\\r\\na=rtcp-fb:111 transport-cc\\r\\na=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time\\r\\na=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid\\r\\na=msid:janus janusa\\r\\na=ssrc:2109388804 cname:janus\\r\\na=candidate:1 1 udp 2015363327 34.220.76.202 58116 typ host\\r\\na=candidate:2 1 udp 1679819007 34.220.76.202 58116 typ srflx raddr 172.18.0.4 rport 58116\\r\\na=end-of-candidates\\r\\nm=video 9 UDP/TLS/RTP/SAVPF 100 101\\r\\nc=IN IP4 34.220.76.202\\r\\na=sendonly\\r\\na=mid:v\\r\\na=rtcp-mux\\r\\na=ice-ufrag:BemM\\r\\na=ice-pwd:Z4htJ5InK3kXWoFkpoTYwt\\r\\na=ice-options:trickle\\r\\na=setup:actpass\\r\\na=rtpmap:100 VP8/90000\\r\\na=rtcp-fb:100 ccm fir\\r\\na=rtcp-fb:100 nack\\r\\na=rtcp-fb:100 nack pli\\r\\na=rtcp-fb:100 goog-remb\\r\\na=rtcp-fb:100 transport-cc\\r\\na=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time\\r\\na=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid\\r\\na=rtpmap:101 rtx/90000\\r\\na=fmtp:101 apt=100\\r\\na=ssrc-group:FID 3483544582 889868455\\r\\na=msid:janus janusv\\r\\na=ssrc:3483544582 cname:janus\\r\\na=ssrc:889868455 cname:janus\\r\\na=candidate:1 1 udp 2015363327 34.220.76.202 58116 typ host\\r\\na=candidate:2 1 udp 1679819007 34.220.76.202 58116 typ srflx raddr 172.18.0.4 rport 58116\\r\\na=end-of-candidates\\r\\nm=application 9 UDP/DTLS/SCTP webrtc-datachannel\\r\\nc=IN IP4 34.220.76.202\\r\\na=sendrecv\\r\\na=mid:d\\r\\na=sctp-port:5000\\r\\na=ice-ufrag:BemM\\r\\na=ice-pwd:Z4htJ5InK3kXWoFkpoTYwt\\r\\na=ice-options:trickle\\r\\na=setup:actpass\\r\\na=candidate:1 1 udp 2015363327 34.220.76.202 58116 typ host\\r\\na=candidate:2 1 udp 1679819007 34.220.76.202 58116 typ srflx raddr 172.18.0.4 rport 58116\\r\\na=end-of-candidates\\r\\n\"}",
 ice_servers:
  [{"urls"=>
     "stun:stun.kinesisvideo.us-west-2.amazonaws.com:443"},
   {"urls"=>
     ["turn:54-191-109-117.t-853e5b95.kinesisvideo.us-west-2.amazonaws.com:443?transport=udp",
      "turns:54-191-109-117.t-853e5b95.kinesisvideo.us-west-2.amazonaws.com:443?transport=udp",
      "turns:54-191-109-117.t-853e5b95.kinesisvideo.us-west-2.amazonaws.com:443?transport=tcp"],
    "username"=>
     "1698927702:djE6YXJuOmF3czpraW5lc2lzdmlkZW86dXMtd2VzdC0yOjg5OTAxNjUwOTUyMDpjaGFubmVsL3RhbGtzLXN0cmVhbWVyLXByb2QtMDAvMTY3NzE2OTkyNTMzNQ==",
    "credential"=>
     "XYp3xNz220oqjWlsxYGcJ9MxNgCVopYYl7f4Gvsdz80="}],
 metadata: {},
 owner_id: 3,
 owner_type: "AiModel",
 subtype: "D-ID",
 created_at:
  Thu, 02 Nov 2023 12:16:44.871475000 UTC +00:00,
 updated_at:
  Thu, 02 Nov 2023 12:16:44.871475000 UTC +00:00>

it's a big ugly but im trying to follow the ontrack login in your sample repo:

onTrack(event) {
		/**
		 * The following code is designed to provide information about wether currently there is data
		 * that's being streamed - It does so by periodically looking for changes in total stream data size
		 *
		 * This information in our case is used in order to show idle video while no talk is streaming.
		 * To create this idle video use the POST https://api.d-id.com/talks endpoint with a silent audio file or a text script with only ssml breaks
		 * https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html#break-tag
		 * for seamless results use `config.fluent: true` and provide the same configuration as the streaming video
		 */

		if (!event.track) return;

		let statsIntervalId = setInterval(async () => {

			const stats = await this.getStats(event.track);
			stats.forEach((report) => {
				if (report.type === 'inbound-rtp' && report.mediaType === 'video') {

					const videoStatusChanged = dfg.project.add_ons.interview_stream.current.videoIsPlaying !== report.bytesReceived > dfg.project.add_ons.interview_stream.current.lastBytesReceived;

					if (videoStatusChanged) {
						dfg.project.add_ons.interview_stream.current.videoIsPlaying = report.bytesReceived > dfg.project.add_ons.interview_stream.current.lastBytesReceived;
						dfg.project.add_ons.interview_stream.onVideoStatusChange(dfg.project.add_ons.interview_stream.current.videoIsPlaying, event.streams[0]);
					}
					dfg.project.add_ons.interview_stream.current.lastBytesReceived = report.bytesReceived;
				}
			});
		}, 500);
	}

It seems to "play" the static video + audio, and then immediately go back to idling...according to the logs...that makes sense and is how it should work. I think. (if the lips were actually moving)

I've been working super hard on this and would appreciate any help! Thanks!