Auto-targeting of designated speaker in multi-person image using Hedra API

Hi!

We’re currently using the Hedra API endpoint to generate videos with dialogue, but the current issue we’re running into relates to multi-person images. Using your frontend, we’re able to assign the speaker using the new drag-n-drop tool you’ve added to help with getting the dialogue coming from the right characters - however, in our case as we’re automating the generation step, we need a means of assigning the speaker as part of our API call.

The current issue is (videos attached as examples) without being able to use your targeting frontend tool - the wrong character is being assigned dialogue - as seen in these videos. The person in the foreground of these over-the-shoulder, two-person shots is incorrectly selected as the speaker by default OR in many cases, both person’s may end up mouthing the words of dialogue.


Is it possible to introduce an additional field that’ll help us identify the character we wish to speak within the scene as part of our generation request? We appreciate this’ll mean creating a system (auto-targeting tool) that recognises individuals by description or positioning within a scene image - but this would really go a long way to helping us further automate our generative pipeline, as well as others finding this same issue when using your API.

Thanks!

- Dan.

Please authenticate to join the conversation.

Upvoters
Status

In Review

Board

💡 Feature Request

Date

3 months ago

Author

Dan Dawkins

Subscribe to post

Get notified by email when there are changes.