Merged with master

This commit is contained in:
Prianka Liz Kariat 2023-05-08 16:24:54 +05:30
commit f86188f8e1
91 changed files with 3300 additions and 1467 deletions

195
README.md
View File

@ -4,8 +4,6 @@ title: Home
nav_order: 1 nav_order: 1
--- ---
![MediaPipe](https://mediapipe.dev/images/mediapipe_small.png)
---- ----
**Attention:** *Thanks for your interest in MediaPipe! We have moved to **Attention:** *Thanks for your interest in MediaPipe! We have moved to
@ -14,86 +12,111 @@ as the primary developer documentation site for MediaPipe as of April 3, 2023.*
*This notice and web page will be removed on June 1, 2023.* *This notice and web page will be removed on June 1, 2023.*
---- ![MediaPipe](https://developers.google.com/static/mediapipe/images/home/hero_01_1920.png)
<br><br><br><br><br><br><br><br><br><br> **Attention**: MediaPipe Solutions Preview is an early release. [Learn
<br><br><br><br><br><br><br><br><br><br> more](https://developers.google.com/mediapipe/solutions/about#notice).
<br><br><br><br><br><br><br><br><br><br>
-------------------------------------------------------------------------------- **On-device machine learning for everyone**
## Live ML anywhere Delight your customers with innovative machine learning features. MediaPipe
contains everything that you need to customize and deploy to mobile (Android,
iOS), web, desktop, edge devices, and IoT, effortlessly.
[MediaPipe](https://google.github.io/mediapipe/) offers cross-platform, customizable * [See demos](https://goo.gle/mediapipe-studio)
ML solutions for live and streaming media. * [Learn more](https://developers.google.com/mediapipe/solutions)
![accelerated.png](https://mediapipe.dev/images/accelerated_small.png) | ![cross_platform.png](https://mediapipe.dev/images/cross_platform_small.png) ## Get started
:------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------:
***End-to-End acceleration***: *Built-in fast ML inference and processing accelerated even on common hardware* | ***Build once, deploy anywhere***: *Unified solution works across Android, iOS, desktop/cloud, web and IoT*
![ready_to_use.png](https://mediapipe.dev/images/ready_to_use_small.png) | ![open_source.png](https://mediapipe.dev/images/open_source_small.png)
***Ready-to-use solutions***: *Cutting-edge ML solutions demonstrating full power of the framework* | ***Free and open source***: *Framework and solutions both under Apache 2.0, fully extensible and customizable*
---- You can get started with MediaPipe Solutions by by checking out any of the
developer guides for
[vision](https://developers.google.com/mediapipe/solutions/vision/object_detector),
[text](https://developers.google.com/mediapipe/solutions/text/text_classifier),
and
[audio](https://developers.google.com/mediapipe/solutions/audio/audio_classifier)
tasks. If you need help setting up a development environment for use with
MediaPipe Tasks, check out the setup guides for
[Android](https://developers.google.com/mediapipe/solutions/setup_android), [web
apps](https://developers.google.com/mediapipe/solutions/setup_web), and
[Python](https://developers.google.com/mediapipe/solutions/setup_python).
## ML solutions in MediaPipe ## Solutions
Face Detection | Face Mesh | Iris | Hands | Pose | Holistic MediaPipe Solutions provides a suite of libraries and tools for you to quickly
:----------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------: | :------: apply artificial intelligence (AI) and machine learning (ML) techniques in your
[![face_detection](https://mediapipe.dev/images/mobile/face_detection_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/face_detection) | [![face_mesh](https://mediapipe.dev/images/mobile/face_mesh_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/face_mesh) | [![iris](https://mediapipe.dev/images/mobile/iris_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/iris) | [![hand](https://mediapipe.dev/images/mobile/hand_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/hands) | [![pose](https://mediapipe.dev/images/mobile/pose_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/pose) | [![hair_segmentation](https://mediapipe.dev/images/mobile/holistic_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/holistic) applications. You can plug these solutions into your applications immediately,
customize them to your needs, and use them across multiple development
platforms. MediaPipe Solutions is part of the MediaPipe [open source
project](https://github.com/google/mediapipe), so you can further customize the
solutions code to meet your application needs.
Hair Segmentation | Object Detection | Box Tracking | Instant Motion Tracking | Objectron | KNIFT These libraries and resources provide the core functionality for each MediaPipe
:-------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | :---: Solution:
[![hair_segmentation](https://mediapipe.dev/images/mobile/hair_segmentation_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/hair_segmentation) | [![object_detection](https://mediapipe.dev/images/mobile/object_detection_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/object_detection) | [![box_tracking](https://mediapipe.dev/images/mobile/object_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/box_tracking) | [![instant_motion_tracking](https://mediapipe.dev/images/mobile/instant_motion_tracking_android_small.gif)](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | [![objectron](https://mediapipe.dev/images/mobile/objectron_chair_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/objectron) | [![knift](https://mediapipe.dev/images/mobile/template_matching_android_cpu_small.gif)](https://google.github.io/mediapipe/solutions/knift)
<!-- []() in the first cell is needed to preserve table formatting in GitHub Pages. --> * **MediaPipe Tasks**: Cross-platform APIs and libraries for deploying
<!-- Whenever this table is updated, paste a copy to solutions/solutions.md. --> solutions. [Learn
more](https://developers.google.com/mediapipe/solutions/tasks).
* **MediaPipe models**: Pre-trained, ready-to-run models for use with each
solution.
[]() | [Android](https://google.github.io/mediapipe/getting_started/android) | [iOS](https://google.github.io/mediapipe/getting_started/ios) | [C++](https://google.github.io/mediapipe/getting_started/cpp) | [Python](https://google.github.io/mediapipe/getting_started/python) | [JS](https://google.github.io/mediapipe/getting_started/javascript) | [Coral](https://github.com/google/mediapipe/tree/master/mediapipe/examples/coral/README.md) These tools let you customize and evaluate solutions:
:---------------------------------------------------------------------------------------- | :-------------------------------------------------------------: | :-----------------------------------------------------: | :-----------------------------------------------------: | :-----------------------------------------------------------: | :-----------------------------------------------------------: | :--------------------------------------------------------------------:
[Face Detection](https://google.github.io/mediapipe/solutions/face_detection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅
[Face Mesh](https://google.github.io/mediapipe/solutions/face_mesh) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Iris](https://google.github.io/mediapipe/solutions/iris) | ✅ | ✅ | ✅ | | |
[Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | |
[Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅
[Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | |
[Instant Motion Tracking](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | ✅ | | | | |
[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | ✅ |
[KNIFT](https://google.github.io/mediapipe/solutions/knift) | ✅ | | | | |
[AutoFlip](https://google.github.io/mediapipe/solutions/autoflip) | | | ✅ | | |
[MediaSequence](https://google.github.io/mediapipe/solutions/media_sequence) | | | ✅ | | |
[YouTube 8M](https://google.github.io/mediapipe/solutions/youtube_8m) | | | ✅ | | |
See also * **MediaPipe Model Maker**: Customize models for solutions with your data.
[MediaPipe Models and Model Cards](https://google.github.io/mediapipe/solutions/models) [Learn more](https://developers.google.com/mediapipe/solutions/model_maker).
for ML models released in MediaPipe. * **MediaPipe Studio**: Visualize, evaluate, and benchmark solutions in your
browser. [Learn
more](https://developers.google.com/mediapipe/solutions/studio).
## Getting started ### Legacy solutions
To start using MediaPipe We have ended support for [these MediaPipe Legacy Solutions](https://developers.google.com/mediapipe/solutions/guide#legacy)
[solutions](https://google.github.io/mediapipe/solutions/solutions) with only a few as of March 1, 2023. All other MediaPipe Legacy Solutions will be upgraded to
lines code, see example code and demos in a new MediaPipe Solution. See the [Solutions guide](https://developers.google.com/mediapipe/solutions/guide#legacy)
[MediaPipe in Python](https://google.github.io/mediapipe/getting_started/python) and for details. The [code repository](https://github.com/google/mediapipe/tree/master/mediapipe)
[MediaPipe in JavaScript](https://google.github.io/mediapipe/getting_started/javascript). and prebuilt binaries for all MediaPipe Legacy Solutions will continue to be
provided on an as-is basis.
To use MediaPipe in C++, Android and iOS, which allow further customization of For more on the legacy solutions, see the [documentation](https://github.com/google/mediapipe/tree/master/docs/solutions).
the [solutions](https://google.github.io/mediapipe/solutions/solutions) as well as
building your own, learn how to
[install](https://google.github.io/mediapipe/getting_started/install) MediaPipe and
start building example applications in
[C++](https://google.github.io/mediapipe/getting_started/cpp),
[Android](https://google.github.io/mediapipe/getting_started/android) and
[iOS](https://google.github.io/mediapipe/getting_started/ios).
The source code is hosted in the ## Framework
[MediaPipe Github repository](https://github.com/google/mediapipe), and you can
run code search using
[Google Open Source Code Search](https://cs.opensource.google/mediapipe/mediapipe).
## Publications To start using MediaPipe Framework, [install MediaPipe
Framework](https://developers.google.com/mediapipe/framework/getting_started/install)
and start building example applications in C++, Android, and iOS.
[MediaPipe Framework](https://developers.google.com/mediapipe/framework) is the
low-level component used to build efficient on-device machine learning
pipelines, similar to the premade MediaPipe Solutions.
Before using MediaPipe Framework, familiarize yourself with the following key
[Framework
concepts](https://developers.google.com/mediapipe/framework/framework_concepts/overview.md):
* [Packets](https://developers.google.com/mediapipe/framework/framework_concepts/packets.md)
* [Graphs](https://developers.google.com/mediapipe/framework/framework_concepts/graphs.md)
* [Calculators](https://developers.google.com/mediapipe/framework/framework_concepts/calculators.md)
## Community
* [Slack community](https://mediapipe.page.link/joinslack) for MediaPipe
users.
* [Discuss](https://groups.google.com/forum/#!forum/mediapipe) - General
community discussion around MediaPipe.
* [Awesome MediaPipe](https://mediapipe.page.link/awesome-mediapipe) - A
curated list of awesome MediaPipe related frameworks, libraries and
software.
## Contributing
We welcome contributions. Please follow these
[guidelines](https://github.com/google/mediapipe/blob/master/CONTRIBUTING.md).
We use GitHub issues for tracking requests and bugs. Please post questions to
the MediaPipe Stack Overflow with a `mediapipe` tag.
## Resources
### Publications
* [Bringing artworks to life with AR](https://developers.googleblog.com/2021/07/bringing-artworks-to-life-with-ar.html) * [Bringing artworks to life with AR](https://developers.googleblog.com/2021/07/bringing-artworks-to-life-with-ar.html)
in Google Developers Blog in Google Developers Blog
@ -102,7 +125,8 @@ run code search using
* [SignAll SDK: Sign language interface using MediaPipe is now available for * [SignAll SDK: Sign language interface using MediaPipe is now available for
developers](https://developers.googleblog.com/2021/04/signall-sdk-sign-language-interface-using-mediapipe-now-available.html) developers](https://developers.googleblog.com/2021/04/signall-sdk-sign-language-interface-using-mediapipe-now-available.html)
in Google Developers Blog in Google Developers Blog
* [MediaPipe Holistic - Simultaneous Face, Hand and Pose Prediction, on Device](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html) * [MediaPipe Holistic - Simultaneous Face, Hand and Pose Prediction, on
Device](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html)
in Google AI Blog in Google AI Blog
* [Background Features in Google Meet, Powered by Web ML](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html) * [Background Features in Google Meet, Powered by Web ML](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html)
in Google AI Blog in Google AI Blog
@ -130,43 +154,6 @@ run code search using
in Google AI Blog in Google AI Blog
* [MediaPipe: A Framework for Building Perception Pipelines](https://arxiv.org/abs/1906.08172) * [MediaPipe: A Framework for Building Perception Pipelines](https://arxiv.org/abs/1906.08172)
## Videos ### Videos
* [YouTube Channel](https://www.youtube.com/c/MediaPipe) * [YouTube Channel](https://www.youtube.com/c/MediaPipe)
## Events
* [MediaPipe Seattle Meetup, Google Building Waterside, 13 Feb 2020](https://mediapipe.page.link/seattle2020)
* [AI Nextcon 2020, 12-16 Feb 2020, Seattle](http://aisea20.xnextcon.com/)
* [MediaPipe Madrid Meetup, 16 Dec 2019](https://www.meetup.com/Madrid-AI-Developers-Group/events/266329088/)
* [MediaPipe London Meetup, Google 123 Building, 12 Dec 2019](https://www.meetup.com/London-AI-Tech-Talk/events/266329038)
* [ML Conference, Berlin, 11 Dec 2019](https://mlconference.ai/machine-learning-advanced-development/mediapipe-building-real-time-cross-platform-mobile-web-edge-desktop-video-audio-ml-pipelines/)
* [MediaPipe Berlin Meetup, Google Berlin, 11 Dec 2019](https://www.meetup.com/Berlin-AI-Tech-Talk/events/266328794/)
* [The 3rd Workshop on YouTube-8M Large Scale Video Understanding Workshop,
Seoul, Korea ICCV
2019](https://research.google.com/youtube8m/workshop2019/index.html)
* [AI DevWorld 2019, 10 Oct 2019, San Jose, CA](https://aidevworld.com)
* [Google Industry Workshop at ICIP 2019, 24 Sept 2019, Taipei, Taiwan](http://2019.ieeeicip.org/?action=page4&id=14#Google)
([presentation](https://docs.google.com/presentation/d/e/2PACX-1vRIBBbO_LO9v2YmvbHHEt1cwyqH6EjDxiILjuT0foXy1E7g6uyh4CesB2DkkEwlRDO9_lWfuKMZx98T/pub?start=false&loop=false&delayms=3000&slide=id.g556cc1a659_0_5))
* [Open sourced at CVPR 2019, 17~20 June, Long Beach, CA](https://sites.google.com/corp/view/perception-cv4arvr/mediapipe)
## Community
* [Awesome MediaPipe](https://mediapipe.page.link/awesome-mediapipe) - A
curated list of awesome MediaPipe related frameworks, libraries and software
* [Slack community](https://mediapipe.page.link/joinslack) for MediaPipe users
* [Discuss](https://groups.google.com/forum/#!forum/mediapipe) - General
community discussion around MediaPipe
## Alpha disclaimer
MediaPipe is currently in alpha at v0.7. We may be still making breaking API
changes and expect to get to stable APIs by v1.0.
## Contributing
We welcome contributions. Please follow these
[guidelines](https://github.com/google/mediapipe/blob/master/CONTRIBUTING.md).
We use GitHub issues for tracking requests and bugs. Please post questions to
the MediaPipe Stack Overflow with a `mediapipe` tag.

View File

@ -375,6 +375,18 @@ http_archive(
url = "https://github.com/opencv/opencv/releases/download/3.2.0/opencv-3.2.0-ios-framework.zip", url = "https://github.com/opencv/opencv/releases/download/3.2.0/opencv-3.2.0-ios-framework.zip",
) )
# Building an opencv.xcframework from the OpenCV 4.5.1 sources is necessary for
# MediaPipe iOS Task Libraries to be supported on arm64(M1) Macs. An
# `opencv.xcframework` archive has not been released and it is recommended to
# build the same from source using a script provided in OpenCV 4.5.0 upwards.
http_archive(
name = "ios_opencv_source",
sha256 = "5fbc26ee09e148a4d494b225d04217f7c913ca1a4d46115b70cca3565d7bbe05",
build_file = "@//third_party:opencv_ios_source.BUILD",
type = "zip",
url = "https://github.com/opencv/opencv/archive/refs/tags/4.5.1.zip",
)
http_archive( http_archive(
name = "stblib", name = "stblib",
strip_prefix = "stb-b42009b3b9d4ca35bc703f5310eedc74f584be58", strip_prefix = "stb-b42009b3b9d4ca35bc703f5310eedc74f584be58",

View File

@ -4,8 +4,6 @@ title: Home
nav_order: 1 nav_order: 1
--- ---
![MediaPipe](https://mediapipe.dev/images/mediapipe_small.png)
---- ----
**Attention:** *Thanks for your interest in MediaPipe! We have moved to **Attention:** *Thanks for your interest in MediaPipe! We have moved to
@ -14,86 +12,111 @@ as the primary developer documentation site for MediaPipe as of April 3, 2023.*
*This notice and web page will be removed on June 1, 2023.* *This notice and web page will be removed on June 1, 2023.*
---- ![MediaPipe](https://developers.google.com/static/mediapipe/images/home/hero_01_1920.png)
<br><br><br><br><br><br><br><br><br><br> **Attention**: MediaPipe Solutions Preview is an early release. [Learn
<br><br><br><br><br><br><br><br><br><br> more](https://developers.google.com/mediapipe/solutions/about#notice).
<br><br><br><br><br><br><br><br><br><br>
-------------------------------------------------------------------------------- **On-device machine learning for everyone**
## Live ML anywhere Delight your customers with innovative machine learning features. MediaPipe
contains everything that you need to customize and deploy to mobile (Android,
iOS), web, desktop, edge devices, and IoT, effortlessly.
[MediaPipe](https://google.github.io/mediapipe/) offers cross-platform, customizable * [See demos](https://goo.gle/mediapipe-studio)
ML solutions for live and streaming media. * [Learn more](https://developers.google.com/mediapipe/solutions)
![accelerated.png](https://mediapipe.dev/images/accelerated_small.png) | ![cross_platform.png](https://mediapipe.dev/images/cross_platform_small.png) ## Get started
:------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------:
***End-to-End acceleration***: *Built-in fast ML inference and processing accelerated even on common hardware* | ***Build once, deploy anywhere***: *Unified solution works across Android, iOS, desktop/cloud, web and IoT*
![ready_to_use.png](https://mediapipe.dev/images/ready_to_use_small.png) | ![open_source.png](https://mediapipe.dev/images/open_source_small.png)
***Ready-to-use solutions***: *Cutting-edge ML solutions demonstrating full power of the framework* | ***Free and open source***: *Framework and solutions both under Apache 2.0, fully extensible and customizable*
---- You can get started with MediaPipe Solutions by by checking out any of the
developer guides for
[vision](https://developers.google.com/mediapipe/solutions/vision/object_detector),
[text](https://developers.google.com/mediapipe/solutions/text/text_classifier),
and
[audio](https://developers.google.com/mediapipe/solutions/audio/audio_classifier)
tasks. If you need help setting up a development environment for use with
MediaPipe Tasks, check out the setup guides for
[Android](https://developers.google.com/mediapipe/solutions/setup_android), [web
apps](https://developers.google.com/mediapipe/solutions/setup_web), and
[Python](https://developers.google.com/mediapipe/solutions/setup_python).
## ML solutions in MediaPipe ## Solutions
Face Detection | Face Mesh | Iris | Hands | Pose | Holistic MediaPipe Solutions provides a suite of libraries and tools for you to quickly
:----------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------: | :------: apply artificial intelligence (AI) and machine learning (ML) techniques in your
[![face_detection](https://mediapipe.dev/images/mobile/face_detection_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/face_detection) | [![face_mesh](https://mediapipe.dev/images/mobile/face_mesh_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/face_mesh) | [![iris](https://mediapipe.dev/images/mobile/iris_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/iris) | [![hand](https://mediapipe.dev/images/mobile/hand_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/hands) | [![pose](https://mediapipe.dev/images/mobile/pose_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/pose) | [![hair_segmentation](https://mediapipe.dev/images/mobile/holistic_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/holistic) applications. You can plug these solutions into your applications immediately,
customize them to your needs, and use them across multiple development
platforms. MediaPipe Solutions is part of the MediaPipe [open source
project](https://github.com/google/mediapipe), so you can further customize the
solutions code to meet your application needs.
Hair Segmentation | Object Detection | Box Tracking | Instant Motion Tracking | Objectron | KNIFT These libraries and resources provide the core functionality for each MediaPipe
:-------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | :---: Solution:
[![hair_segmentation](https://mediapipe.dev/images/mobile/hair_segmentation_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/hair_segmentation) | [![object_detection](https://mediapipe.dev/images/mobile/object_detection_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/object_detection) | [![box_tracking](https://mediapipe.dev/images/mobile/object_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/box_tracking) | [![instant_motion_tracking](https://mediapipe.dev/images/mobile/instant_motion_tracking_android_small.gif)](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | [![objectron](https://mediapipe.dev/images/mobile/objectron_chair_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/objectron) | [![knift](https://mediapipe.dev/images/mobile/template_matching_android_cpu_small.gif)](https://google.github.io/mediapipe/solutions/knift)
<!-- []() in the first cell is needed to preserve table formatting in GitHub Pages. --> * **MediaPipe Tasks**: Cross-platform APIs and libraries for deploying
<!-- Whenever this table is updated, paste a copy to solutions/solutions.md. --> solutions. [Learn
more](https://developers.google.com/mediapipe/solutions/tasks).
* **MediaPipe models**: Pre-trained, ready-to-run models for use with each
solution.
[]() | [Android](https://google.github.io/mediapipe/getting_started/android) | [iOS](https://google.github.io/mediapipe/getting_started/ios) | [C++](https://google.github.io/mediapipe/getting_started/cpp) | [Python](https://google.github.io/mediapipe/getting_started/python) | [JS](https://google.github.io/mediapipe/getting_started/javascript) | [Coral](https://github.com/google/mediapipe/tree/master/mediapipe/examples/coral/README.md) These tools let you customize and evaluate solutions:
:---------------------------------------------------------------------------------------- | :-------------------------------------------------------------: | :-----------------------------------------------------: | :-----------------------------------------------------: | :-----------------------------------------------------------: | :-----------------------------------------------------------: | :--------------------------------------------------------------------:
[Face Detection](https://google.github.io/mediapipe/solutions/face_detection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅
[Face Mesh](https://google.github.io/mediapipe/solutions/face_mesh) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Iris](https://google.github.io/mediapipe/solutions/iris) | ✅ | ✅ | ✅ | | |
[Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | |
[Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅
[Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | |
[Instant Motion Tracking](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | ✅ | | | | |
[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | ✅ |
[KNIFT](https://google.github.io/mediapipe/solutions/knift) | ✅ | | | | |
[AutoFlip](https://google.github.io/mediapipe/solutions/autoflip) | | | ✅ | | |
[MediaSequence](https://google.github.io/mediapipe/solutions/media_sequence) | | | ✅ | | |
[YouTube 8M](https://google.github.io/mediapipe/solutions/youtube_8m) | | | ✅ | | |
See also * **MediaPipe Model Maker**: Customize models for solutions with your data.
[MediaPipe Models and Model Cards](https://google.github.io/mediapipe/solutions/models) [Learn more](https://developers.google.com/mediapipe/solutions/model_maker).
for ML models released in MediaPipe. * **MediaPipe Studio**: Visualize, evaluate, and benchmark solutions in your
browser. [Learn
more](https://developers.google.com/mediapipe/solutions/studio).
## Getting started ### Legacy solutions
To start using MediaPipe We have ended support for [these MediaPipe Legacy Solutions](https://developers.google.com/mediapipe/solutions/guide#legacy)
[solutions](https://google.github.io/mediapipe/solutions/solutions) with only a few as of March 1, 2023. All other MediaPipe Legacy Solutions will be upgraded to
lines code, see example code and demos in a new MediaPipe Solution. See the [Solutions guide](https://developers.google.com/mediapipe/solutions/guide#legacy)
[MediaPipe in Python](https://google.github.io/mediapipe/getting_started/python) and for details. The [code repository](https://github.com/google/mediapipe/tree/master/mediapipe)
[MediaPipe in JavaScript](https://google.github.io/mediapipe/getting_started/javascript). and prebuilt binaries for all MediaPipe Legacy Solutions will continue to be
provided on an as-is basis.
To use MediaPipe in C++, Android and iOS, which allow further customization of For more on the legacy solutions, see the [documentation](https://github.com/google/mediapipe/tree/master/docs/solutions).
the [solutions](https://google.github.io/mediapipe/solutions/solutions) as well as
building your own, learn how to
[install](https://google.github.io/mediapipe/getting_started/install) MediaPipe and
start building example applications in
[C++](https://google.github.io/mediapipe/getting_started/cpp),
[Android](https://google.github.io/mediapipe/getting_started/android) and
[iOS](https://google.github.io/mediapipe/getting_started/ios).
The source code is hosted in the ## Framework
[MediaPipe Github repository](https://github.com/google/mediapipe), and you can
run code search using
[Google Open Source Code Search](https://cs.opensource.google/mediapipe/mediapipe).
## Publications To start using MediaPipe Framework, [install MediaPipe
Framework](https://developers.google.com/mediapipe/framework/getting_started/install)
and start building example applications in C++, Android, and iOS.
[MediaPipe Framework](https://developers.google.com/mediapipe/framework) is the
low-level component used to build efficient on-device machine learning
pipelines, similar to the premade MediaPipe Solutions.
Before using MediaPipe Framework, familiarize yourself with the following key
[Framework
concepts](https://developers.google.com/mediapipe/framework/framework_concepts/overview.md):
* [Packets](https://developers.google.com/mediapipe/framework/framework_concepts/packets.md)
* [Graphs](https://developers.google.com/mediapipe/framework/framework_concepts/graphs.md)
* [Calculators](https://developers.google.com/mediapipe/framework/framework_concepts/calculators.md)
## Community
* [Slack community](https://mediapipe.page.link/joinslack) for MediaPipe
users.
* [Discuss](https://groups.google.com/forum/#!forum/mediapipe) - General
community discussion around MediaPipe.
* [Awesome MediaPipe](https://mediapipe.page.link/awesome-mediapipe) - A
curated list of awesome MediaPipe related frameworks, libraries and
software.
## Contributing
We welcome contributions. Please follow these
[guidelines](https://github.com/google/mediapipe/blob/master/CONTRIBUTING.md).
We use GitHub issues for tracking requests and bugs. Please post questions to
the MediaPipe Stack Overflow with a `mediapipe` tag.
## Resources
### Publications
* [Bringing artworks to life with AR](https://developers.googleblog.com/2021/07/bringing-artworks-to-life-with-ar.html) * [Bringing artworks to life with AR](https://developers.googleblog.com/2021/07/bringing-artworks-to-life-with-ar.html)
in Google Developers Blog in Google Developers Blog
@ -102,7 +125,8 @@ run code search using
* [SignAll SDK: Sign language interface using MediaPipe is now available for * [SignAll SDK: Sign language interface using MediaPipe is now available for
developers](https://developers.googleblog.com/2021/04/signall-sdk-sign-language-interface-using-mediapipe-now-available.html) developers](https://developers.googleblog.com/2021/04/signall-sdk-sign-language-interface-using-mediapipe-now-available.html)
in Google Developers Blog in Google Developers Blog
* [MediaPipe Holistic - Simultaneous Face, Hand and Pose Prediction, on Device](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html) * [MediaPipe Holistic - Simultaneous Face, Hand and Pose Prediction, on
Device](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html)
in Google AI Blog in Google AI Blog
* [Background Features in Google Meet, Powered by Web ML](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html) * [Background Features in Google Meet, Powered by Web ML](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html)
in Google AI Blog in Google AI Blog
@ -130,43 +154,6 @@ run code search using
in Google AI Blog in Google AI Blog
* [MediaPipe: A Framework for Building Perception Pipelines](https://arxiv.org/abs/1906.08172) * [MediaPipe: A Framework for Building Perception Pipelines](https://arxiv.org/abs/1906.08172)
## Videos ### Videos
* [YouTube Channel](https://www.youtube.com/c/MediaPipe) * [YouTube Channel](https://www.youtube.com/c/MediaPipe)
## Events
* [MediaPipe Seattle Meetup, Google Building Waterside, 13 Feb 2020](https://mediapipe.page.link/seattle2020)
* [AI Nextcon 2020, 12-16 Feb 2020, Seattle](http://aisea20.xnextcon.com/)
* [MediaPipe Madrid Meetup, 16 Dec 2019](https://www.meetup.com/Madrid-AI-Developers-Group/events/266329088/)
* [MediaPipe London Meetup, Google 123 Building, 12 Dec 2019](https://www.meetup.com/London-AI-Tech-Talk/events/266329038)
* [ML Conference, Berlin, 11 Dec 2019](https://mlconference.ai/machine-learning-advanced-development/mediapipe-building-real-time-cross-platform-mobile-web-edge-desktop-video-audio-ml-pipelines/)
* [MediaPipe Berlin Meetup, Google Berlin, 11 Dec 2019](https://www.meetup.com/Berlin-AI-Tech-Talk/events/266328794/)
* [The 3rd Workshop on YouTube-8M Large Scale Video Understanding Workshop,
Seoul, Korea ICCV
2019](https://research.google.com/youtube8m/workshop2019/index.html)
* [AI DevWorld 2019, 10 Oct 2019, San Jose, CA](https://aidevworld.com)
* [Google Industry Workshop at ICIP 2019, 24 Sept 2019, Taipei, Taiwan](http://2019.ieeeicip.org/?action=page4&id=14#Google)
([presentation](https://docs.google.com/presentation/d/e/2PACX-1vRIBBbO_LO9v2YmvbHHEt1cwyqH6EjDxiILjuT0foXy1E7g6uyh4CesB2DkkEwlRDO9_lWfuKMZx98T/pub?start=false&loop=false&delayms=3000&slide=id.g556cc1a659_0_5))
* [Open sourced at CVPR 2019, 17~20 June, Long Beach, CA](https://sites.google.com/corp/view/perception-cv4arvr/mediapipe)
## Community
* [Awesome MediaPipe](https://mediapipe.page.link/awesome-mediapipe) - A
curated list of awesome MediaPipe related frameworks, libraries and software
* [Slack community](https://mediapipe.page.link/joinslack) for MediaPipe users
* [Discuss](https://groups.google.com/forum/#!forum/mediapipe) - General
community discussion around MediaPipe
## Alpha disclaimer
MediaPipe is currently in alpha at v0.7. We may be still making breaking API
changes and expect to get to stable APIs by v1.0.
## Contributing
We welcome contributions. Please follow these
[guidelines](https://github.com/google/mediapipe/blob/master/CONTRIBUTING.md).
We use GitHub issues for tracking requests and bugs. Please post questions to
the MediaPipe Stack Overflow with a `mediapipe` tag.

View File

@ -141,6 +141,7 @@ config_setting(
"ios_armv7", "ios_armv7",
"ios_arm64", "ios_arm64",
"ios_arm64e", "ios_arm64e",
"ios_sim_arm64",
] ]
] ]

View File

@ -33,7 +33,9 @@ bzl_library(
srcs = [ srcs = [
"transitive_protos.bzl", "transitive_protos.bzl",
], ],
visibility = ["//mediapipe/framework:__subpackages__"], visibility = [
"//mediapipe/framework:__subpackages__",
],
) )
bzl_library( bzl_library(

View File

@ -23,15 +23,13 @@ package mediapipe;
option java_package = "com.google.mediapipe.proto"; option java_package = "com.google.mediapipe.proto";
option java_outer_classname = "CalculatorOptionsProto"; option java_outer_classname = "CalculatorOptionsProto";
// Options for Calculators. Each Calculator implementation should // Options for Calculators, DEPRECATED. New calculators are encouraged to use
// have its own options proto, which should look like this: // proto3 syntax options:
// //
// message MyCalculatorOptions { // message MyCalculatorOptions {
// extend CalculatorOptions { // // proto3 does not expect "optional"
// optional MyCalculatorOptions ext = <unique id, e.g. the CL#>; // string field_needed_by_my_calculator = 1;
// } // int32 another_field = 2;
// optional string field_needed_by_my_calculator = 1;
// optional int32 another_field = 2;
// // etc // // etc
// } // }
message CalculatorOptions { message CalculatorOptions {

View File

@ -15,9 +15,7 @@
licenses(["notice"]) licenses(["notice"])
package( package(default_visibility = ["//mediapipe/framework:__subpackages__"])
default_visibility = ["//mediapipe/framework:__subpackages__"],
)
cc_library( cc_library(
name = "simple_calculator", name = "simple_calculator",

View File

@ -974,7 +974,7 @@ class TemplateParser::Parser::ParserImpl {
} }
// Consumes an identifier and saves its value in the identifier parameter. // Consumes an identifier and saves its value in the identifier parameter.
// Returns false if the token is not of type IDENTFIER. // Returns false if the token is not of type IDENTIFIER.
bool ConsumeIdentifier(std::string* identifier) { bool ConsumeIdentifier(std::string* identifier) {
if (LookingAtType(io::Tokenizer::TYPE_IDENTIFIER)) { if (LookingAtType(io::Tokenizer::TYPE_IDENTIFIER)) {
*identifier = tokenizer_.current().text; *identifier = tokenizer_.current().text;
@ -1672,7 +1672,9 @@ class TemplateParser::Parser::MediaPipeParserImpl
if (field_type == ProtoUtilLite::FieldType::TYPE_MESSAGE) { if (field_type == ProtoUtilLite::FieldType::TYPE_MESSAGE) {
*args = {""}; *args = {""};
} else { } else {
MEDIAPIPE_CHECK_OK(ProtoUtilLite::Serialize({"1"}, field_type, args)); constexpr char kPlaceholderValue[] = "1";
MEDIAPIPE_CHECK_OK(
ProtoUtilLite::Serialize({kPlaceholderValue}, field_type, args));
} }
} }

View File

@ -19,9 +19,7 @@ load(
licenses(["notice"]) licenses(["notice"])
package( package(default_visibility = ["//mediapipe/model_maker/python/vision/gesture_recognizer:__subpackages__"])
default_visibility = ["//mediapipe/model_maker/python/vision/gesture_recognizer:__subpackages__"],
)
mediapipe_files( mediapipe_files(
srcs = [ srcs = [

View File

@ -19,9 +19,7 @@ load(
licenses(["notice"]) licenses(["notice"])
package( package(default_visibility = ["//mediapipe/model_maker/python/text/text_classifier:__subpackages__"])
default_visibility = ["//mediapipe/model_maker/python/text/text_classifier:__subpackages__"],
)
mediapipe_files( mediapipe_files(
srcs = [ srcs = [

View File

@ -14,9 +14,7 @@
# Placeholder for internal Python strict library and test compatibility macro. # Placeholder for internal Python strict library and test compatibility macro.
package( package(default_visibility = ["//mediapipe:__subpackages__"])
default_visibility = ["//mediapipe:__subpackages__"],
)
licenses(["notice"]) licenses(["notice"])

View File

@ -17,9 +17,7 @@
licenses(["notice"]) licenses(["notice"])
package( package(default_visibility = ["//mediapipe:__subpackages__"])
default_visibility = ["//mediapipe:__subpackages__"],
)
py_library( py_library(
name = "data_util", name = "data_util",

View File

@ -15,9 +15,7 @@
# Placeholder for internal Python strict library and test compatibility macro. # Placeholder for internal Python strict library and test compatibility macro.
# Placeholder for internal Python strict test compatibility macro. # Placeholder for internal Python strict test compatibility macro.
package( package(default_visibility = ["//mediapipe:__subpackages__"])
default_visibility = ["//mediapipe:__subpackages__"],
)
licenses(["notice"]) licenses(["notice"])

View File

@ -17,9 +17,7 @@
licenses(["notice"]) licenses(["notice"])
package( package(default_visibility = ["//mediapipe:__subpackages__"])
default_visibility = ["//mediapipe:__subpackages__"],
)
py_library( py_library(
name = "test_util", name = "test_util",

View File

@ -14,9 +14,7 @@
# Placeholder for internal Python strict library and test compatibility macro. # Placeholder for internal Python strict library and test compatibility macro.
package( package(default_visibility = ["//mediapipe:__subpackages__"])
default_visibility = ["//mediapipe:__subpackages__"],
)
licenses(["notice"]) licenses(["notice"])

View File

@ -15,9 +15,7 @@
# Placeholder for internal Python strict library and test compatibility macro. # Placeholder for internal Python strict library and test compatibility macro.
# Placeholder for internal Python strict test compatibility macro. # Placeholder for internal Python strict test compatibility macro.
package( package(default_visibility = ["//mediapipe:__subpackages__"])
default_visibility = ["//mediapipe:__subpackages__"],
)
licenses(["notice"]) licenses(["notice"])

View File

@ -12,8 +12,6 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
package( package(default_visibility = ["//mediapipe:__subpackages__"])
default_visibility = ["//mediapipe:__subpackages__"],
)
licenses(["notice"]) licenses(["notice"])

View File

@ -17,9 +17,7 @@
licenses(["notice"]) licenses(["notice"])
package( package(default_visibility = ["//mediapipe:__subpackages__"])
default_visibility = ["//mediapipe:__subpackages__"],
)
# TODO: Remove the unnecessary test data once the demo data are moved to an open-sourced # TODO: Remove the unnecessary test data once the demo data are moved to an open-sourced
# directory. # directory.

View File

@ -17,9 +17,7 @@
licenses(["notice"]) licenses(["notice"])
package( package(default_visibility = ["//mediapipe:__subpackages__"])
default_visibility = ["//mediapipe:__subpackages__"],
)
###################################################################### ######################################################################
# Public target of the MediaPipe Model Maker ImageClassifier APIs. # Public target of the MediaPipe Model Maker ImageClassifier APIs.

View File

@ -17,9 +17,7 @@
licenses(["notice"]) licenses(["notice"])
package( package(default_visibility = ["//mediapipe:__subpackages__"])
default_visibility = ["//mediapipe:__subpackages__"],
)
py_library( py_library(
name = "object_detector_import", name = "object_detector_import",
@ -88,6 +86,17 @@ py_test(
], ],
) )
py_library(
name = "detection",
srcs = ["detection.py"],
)
py_test(
name = "detection_test",
srcs = ["detection_test.py"],
deps = [":detection"],
)
py_library( py_library(
name = "hyperparameters", name = "hyperparameters",
srcs = ["hyperparameters.py"], srcs = ["hyperparameters.py"],
@ -116,6 +125,7 @@ py_library(
name = "model", name = "model",
srcs = ["model.py"], srcs = ["model.py"],
deps = [ deps = [
":detection",
":model_options", ":model_options",
":model_spec", ":model_spec",
], ],
@ -163,6 +173,7 @@ py_library(
"//mediapipe/model_maker/python/core/tasks:classifier", "//mediapipe/model_maker/python/core/tasks:classifier",
"//mediapipe/model_maker/python/core/utils:model_util", "//mediapipe/model_maker/python/core/utils:model_util",
"//mediapipe/model_maker/python/core/utils:quantization", "//mediapipe/model_maker/python/core/utils:quantization",
"//mediapipe/tasks/python/metadata/metadata_writers:metadata_info",
"//mediapipe/tasks/python/metadata/metadata_writers:metadata_writer", "//mediapipe/tasks/python/metadata/metadata_writers:metadata_writer",
"//mediapipe/tasks/python/metadata/metadata_writers:object_detector", "//mediapipe/tasks/python/metadata/metadata_writers:object_detector",
], ],

View File

@ -32,6 +32,7 @@ ObjectDetectorOptions = object_detector_options.ObjectDetectorOptions
# Remove duplicated and non-public API # Remove duplicated and non-public API
del dataset del dataset
del dataset_util # pylint: disable=undefined-variable del dataset_util # pylint: disable=undefined-variable
del detection # pylint: disable=undefined-variable
del hyperparameters del hyperparameters
del model # pylint: disable=undefined-variable del model # pylint: disable=undefined-variable
del model_options del model_options

View File

@ -106,7 +106,7 @@ class Dataset(classification_dataset.ClassificationDataset):
... ...
Each <file0>.xml annotation file should have the following format: Each <file0>.xml annotation file should have the following format:
<annotation> <annotation>
<filename>file0.jpg<filename> <filename>file0.jpg</filename>
<object> <object>
<name>kangaroo</name> <name>kangaroo</name>
<bndbox> <bndbox>
@ -114,6 +114,7 @@ class Dataset(classification_dataset.ClassificationDataset):
<ymin>89</ymin> <ymin>89</ymin>
<xmax>386</xmax> <xmax>386</xmax>
<ymax>262</ymax> <ymax>262</ymax>
</bndbox>
</object> </object>
<object>...</object> <object>...</object>
</annotation> </annotation>

View File

@ -0,0 +1,34 @@
# Copyright 2023 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Custom Detection export module for Object Detection."""
from typing import Any, Mapping
from official.vision.serving import detection
class DetectionModule(detection.DetectionModule):
"""A serving detection module for exporting the model.
This module overrides the tensorflow_models DetectionModule by only outputting
the pre-nms detection_boxes and detection_scores.
"""
def serve(self, images) -> Mapping[str, Any]:
result = super().serve(images)
final_outputs = {
'detection_boxes': result['detection_boxes'],
'detection_scores': result['detection_scores'],
}
return final_outputs

View File

@ -0,0 +1,73 @@
# Copyright 2023 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the 'License');
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an 'AS IS' BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from unittest import mock
import tensorflow as tf
from mediapipe.model_maker.python.vision.object_detector import detection
from official.core import config_definitions as cfg
from official.vision import configs
from official.vision.serving import detection as detection_module
class ObjectDetectorTest(tf.test.TestCase):
@mock.patch.object(detection_module.DetectionModule, 'serve', autospec=True)
def test_detection_module(self, mock_serve):
mock_serve.return_value = {
'detection_boxes': 1,
'detection_scores': 2,
'detection_classes': 3,
'num_detections': 4,
}
model_config = configs.retinanet.RetinaNet(
min_level=3,
max_level=7,
num_classes=10,
input_size=[256, 256, 3],
anchor=configs.retinanet.Anchor(
num_scales=3, aspect_ratios=[0.5, 1.0, 2.0], anchor_size=3
),
backbone=configs.backbones.Backbone(
type='mobilenet', mobilenet=configs.backbones.MobileNet()
),
decoder=configs.decoders.Decoder(
type='fpn',
fpn=configs.decoders.FPN(
num_filters=128, use_separable_conv=True, use_keras_layer=True
),
),
head=configs.retinanet.RetinaNetHead(
num_filters=128, use_separable_conv=True
),
detection_generator=configs.retinanet.DetectionGenerator(),
norm_activation=configs.common.NormActivation(activation='relu6'),
)
task_config = configs.retinanet.RetinaNetTask(model=model_config)
params = cfg.ExperimentConfig(
task=task_config,
)
detection_instance = detection.DetectionModule(
params=params, batch_size=1, input_image_size=[256, 256]
)
outputs = detection_instance.serve(0)
expected_outputs = {
'detection_boxes': 1,
'detection_scores': 2,
}
self.assertAllEqual(outputs, expected_outputs)
if __name__ == '__main__':
tf.test.main()

View File

@ -27,8 +27,6 @@ class HParams(hp.BaseHParams):
learning_rate: Learning rate to use for gradient descent training. learning_rate: Learning rate to use for gradient descent training.
batch_size: Batch size for training. batch_size: Batch size for training.
epochs: Number of training iterations over the dataset. epochs: Number of training iterations over the dataset.
do_fine_tuning: If true, the base module is trained together with the
classification layer on top.
cosine_decay_epochs: The number of epochs for cosine decay learning rate. cosine_decay_epochs: The number of epochs for cosine decay learning rate.
See See
https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/CosineDecay https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/CosineDecay
@ -39,13 +37,13 @@ class HParams(hp.BaseHParams):
""" """
# Parameters from BaseHParams class. # Parameters from BaseHParams class.
learning_rate: float = 0.003 learning_rate: float = 0.3
batch_size: int = 32 batch_size: int = 8
epochs: int = 10 epochs: int = 30
# Parameters for cosine learning rate decay # Parameters for cosine learning rate decay
cosine_decay_epochs: Optional[int] = None cosine_decay_epochs: Optional[int] = None
cosine_decay_alpha: float = 0.0 cosine_decay_alpha: float = 1.0
@dataclasses.dataclass @dataclasses.dataclass
@ -67,8 +65,8 @@ class QATHParams:
for more information. for more information.
""" """
learning_rate: float = 0.03 learning_rate: float = 0.3
batch_size: int = 32 batch_size: int = 8
epochs: int = 10 epochs: int = 15
decay_steps: int = 231 decay_steps: int = 8
decay_rate: float = 0.96 decay_rate: float = 0.96

View File

@ -18,6 +18,7 @@ from typing import Mapping, Optional, Sequence, Union
import tensorflow as tf import tensorflow as tf
from mediapipe.model_maker.python.vision.object_detector import detection
from mediapipe.model_maker.python.vision.object_detector import model_options as model_opt from mediapipe.model_maker.python.vision.object_detector import model_options as model_opt
from mediapipe.model_maker.python.vision.object_detector import model_spec as ms from mediapipe.model_maker.python.vision.object_detector import model_spec as ms
from official.core import config_definitions as cfg from official.core import config_definitions as cfg
@ -29,7 +30,6 @@ from official.vision.losses import loss_utils
from official.vision.modeling import factory from official.vision.modeling import factory
from official.vision.modeling import retinanet_model from official.vision.modeling import retinanet_model
from official.vision.modeling.layers import detection_generator from official.vision.modeling.layers import detection_generator
from official.vision.serving import detection
class ObjectDetectorModel(tf.keras.Model): class ObjectDetectorModel(tf.keras.Model):
@ -199,6 +199,7 @@ class ObjectDetectorModel(tf.keras.Model):
max_detections=10, max_detections=10,
max_classes_per_detection=1, max_classes_per_detection=1,
normalize_anchor_coordinates=True, normalize_anchor_coordinates=True,
omit_nms=True,
), ),
) )
tflite_post_processing_config = ( tflite_post_processing_config = (

View File

@ -28,6 +28,7 @@ from mediapipe.model_maker.python.vision.object_detector import model_options as
from mediapipe.model_maker.python.vision.object_detector import model_spec as ms from mediapipe.model_maker.python.vision.object_detector import model_spec as ms
from mediapipe.model_maker.python.vision.object_detector import object_detector_options from mediapipe.model_maker.python.vision.object_detector import object_detector_options
from mediapipe.model_maker.python.vision.object_detector import preprocessor from mediapipe.model_maker.python.vision.object_detector import preprocessor
from mediapipe.tasks.python.metadata.metadata_writers import metadata_info
from mediapipe.tasks.python.metadata.metadata_writers import metadata_writer from mediapipe.tasks.python.metadata.metadata_writers import metadata_writer
from mediapipe.tasks.python.metadata.metadata_writers import object_detector as object_detector_writer from mediapipe.tasks.python.metadata.metadata_writers import object_detector as object_detector_writer
from official.vision.evaluation import coco_evaluator from official.vision.evaluation import coco_evaluator
@ -264,6 +265,27 @@ class ObjectDetector(classifier.Classifier):
coco_metrics = coco_eval.result() coco_metrics = coco_eval.result()
return losses, coco_metrics return losses, coco_metrics
def _create_fixed_anchor(
self, anchor_box: List[float]
) -> object_detector_writer.FixedAnchor:
"""Helper function to create FixedAnchor objects from an anchor box array.
Args:
anchor_box: List of anchor box coordinates in the format of [x_min, y_min,
x_max, y_max].
Returns:
A FixedAnchor object representing the anchor_box.
"""
image_shape = self._model_spec.input_image_shape[:2]
y_center_norm = (anchor_box[0] + anchor_box[2]) / (2 * image_shape[0])
x_center_norm = (anchor_box[1] + anchor_box[3]) / (2 * image_shape[1])
height_norm = (anchor_box[2] - anchor_box[0]) / image_shape[0]
width_norm = (anchor_box[3] - anchor_box[1]) / image_shape[1]
return object_detector_writer.FixedAnchor(
x_center_norm, y_center_norm, width_norm, height_norm
)
def export_model( def export_model(
self, self,
model_name: str = 'model.tflite', model_name: str = 'model.tflite',
@ -328,11 +350,40 @@ class ObjectDetector(classifier.Classifier):
converter.target_spec.supported_ops = (tf.lite.OpsSet.TFLITE_BUILTINS,) converter.target_spec.supported_ops = (tf.lite.OpsSet.TFLITE_BUILTINS,)
tflite_model = converter.convert() tflite_model = converter.convert()
writer = object_detector_writer.MetadataWriter.create_for_models_with_nms( # Build anchors
raw_anchor_boxes = self._preprocessor.anchor_boxes
anchors = []
for _, anchor_boxes in raw_anchor_boxes.items():
anchor_boxes_reshaped = anchor_boxes.numpy().reshape((-1, 4))
for ab in anchor_boxes_reshaped:
anchors.append(self._create_fixed_anchor(ab))
ssd_anchors_options = object_detector_writer.SsdAnchorsOptions(
object_detector_writer.FixedAnchorsSchema(anchors)
)
tensor_decoding_options = object_detector_writer.TensorsDecodingOptions(
num_classes=self._num_classes,
num_boxes=len(anchors),
num_coords=4,
keypoint_coord_offset=0,
num_keypoints=0,
num_values_per_keypoint=2,
x_scale=1,
y_scale=1,
w_scale=1,
h_scale=1,
apply_exponential_on_box_size=True,
sigmoid_score=False,
)
writer = object_detector_writer.MetadataWriter.create_for_models_without_nms(
tflite_model, tflite_model,
self._model_spec.mean_rgb, self._model_spec.mean_rgb,
self._model_spec.stddev_rgb, self._model_spec.stddev_rgb,
labels=metadata_writer.Labels().add(list(self._label_names)), labels=metadata_writer.Labels().add(list(self._label_names)),
ssd_anchors_options=ssd_anchors_options,
tensors_decoding_options=tensor_decoding_options,
output_tensors_order=metadata_info.RawDetectionOutputTensorsOrder.LOCATION_SCORE,
) )
tflite_model_with_metadata, metadata_json = writer.populate() tflite_model_with_metadata, metadata_json = writer.populate()
model_util.save_tflite(tflite_model_with_metadata, tflite_file) model_util.save_tflite(tflite_model_with_metadata, tflite_file)

View File

@ -44,6 +44,26 @@ class Preprocessor(object):
self._aug_scale_max = 2.0 self._aug_scale_max = 2.0
self._max_num_instances = 100 self._max_num_instances = 100
self._padded_size = preprocess_ops.compute_padded_size(
self._output_size, 2**self._max_level
)
input_anchor = anchor.build_anchor_generator(
min_level=self._min_level,
max_level=self._max_level,
num_scales=self._num_scales,
aspect_ratios=self._aspect_ratios,
anchor_size=self._anchor_size,
)
self._anchor_boxes = input_anchor(image_size=self._output_size)
self._anchor_labeler = anchor.AnchorLabeler(
self._match_threshold, self._unmatched_threshold
)
@property
def anchor_boxes(self):
return self._anchor_boxes
def __call__( def __call__(
self, data: Mapping[str, Any], is_training: bool = True self, data: Mapping[str, Any], is_training: bool = True
) -> Tuple[tf.Tensor, Mapping[str, Any]]: ) -> Tuple[tf.Tensor, Mapping[str, Any]]:
@ -90,13 +110,10 @@ class Preprocessor(object):
image, image_info = preprocess_ops.resize_and_crop_image( image, image_info = preprocess_ops.resize_and_crop_image(
image, image,
self._output_size, self._output_size,
padded_size=preprocess_ops.compute_padded_size( padded_size=self._padded_size,
self._output_size, 2**self._max_level
),
aug_scale_min=(self._aug_scale_min if is_training else 1.0), aug_scale_min=(self._aug_scale_min if is_training else 1.0),
aug_scale_max=(self._aug_scale_max if is_training else 1.0), aug_scale_max=(self._aug_scale_max if is_training else 1.0),
) )
image_height, image_width, _ = image.get_shape().as_list()
# Resize and crop boxes. # Resize and crop boxes.
image_scale = image_info[2, :] image_scale = image_info[2, :]
@ -110,20 +127,9 @@ class Preprocessor(object):
classes = tf.gather(classes, indices) classes = tf.gather(classes, indices)
# Assign anchors. # Assign anchors.
input_anchor = anchor.build_anchor_generator(
min_level=self._min_level,
max_level=self._max_level,
num_scales=self._num_scales,
aspect_ratios=self._aspect_ratios,
anchor_size=self._anchor_size,
)
anchor_boxes = input_anchor(image_size=(image_height, image_width))
anchor_labeler = anchor.AnchorLabeler(
self._match_threshold, self._unmatched_threshold
)
(cls_targets, box_targets, _, cls_weights, box_weights) = ( (cls_targets, box_targets, _, cls_weights, box_weights) = (
anchor_labeler.label_anchors( self._anchor_labeler.label_anchors(
anchor_boxes, boxes, tf.expand_dims(classes, axis=1) self.anchor_boxes, boxes, tf.expand_dims(classes, axis=1)
) )
) )
@ -134,7 +140,7 @@ class Preprocessor(object):
labels = { labels = {
'cls_targets': cls_targets, 'cls_targets': cls_targets,
'box_targets': box_targets, 'box_targets': box_targets,
'anchor_boxes': anchor_boxes, 'anchor_boxes': self.anchor_boxes,
'cls_weights': cls_weights, 'cls_weights': cls_weights,
'box_weights': box_weights, 'box_weights': box_weights,
'image_info': image_info, 'image_info': image_info,

View File

@ -361,9 +361,10 @@ class FaceStylizerGraph : public core::ModelTaskGraph {
auto& tensors_to_image = auto& tensors_to_image =
graph.AddNode("mediapipe.tasks.TensorsToImageCalculator"); graph.AddNode("mediapipe.tasks.TensorsToImageCalculator");
ConfigureTensorsToImageCalculator( auto& tensors_to_image_options =
image_to_tensor_options, tensors_to_image.GetOptions<TensorsToImageCalculatorOptions>();
&tensors_to_image.GetOptions<TensorsToImageCalculatorOptions>()); tensors_to_image_options.mutable_input_tensor_float_range()->set_min(-1);
tensors_to_image_options.mutable_input_tensor_float_range()->set_max(1);
face_alignment_image >> tensors_to_image.In(kTensorsTag); face_alignment_image >> tensors_to_image.In(kTensorsTag);
face_alignment = tensors_to_image.Out(kImageTag).Cast<Image>(); face_alignment = tensors_to_image.Out(kImageTag).Cast<Image>();

View File

@ -63,6 +63,8 @@ cc_library(
"//mediapipe/calculators/image:image_properties_calculator", "//mediapipe/calculators/image:image_properties_calculator",
"//mediapipe/calculators/image:image_transformation_calculator", "//mediapipe/calculators/image:image_transformation_calculator",
"//mediapipe/calculators/image:image_transformation_calculator_cc_proto", "//mediapipe/calculators/image:image_transformation_calculator_cc_proto",
"//mediapipe/calculators/image:set_alpha_calculator",
"//mediapipe/calculators/image:set_alpha_calculator_cc_proto",
"//mediapipe/calculators/tensor:image_to_tensor_calculator", "//mediapipe/calculators/tensor:image_to_tensor_calculator",
"//mediapipe/calculators/tensor:image_to_tensor_calculator_cc_proto", "//mediapipe/calculators/tensor:image_to_tensor_calculator_cc_proto",
"//mediapipe/calculators/tensor:inference_calculator", "//mediapipe/calculators/tensor:inference_calculator",

View File

@ -188,7 +188,7 @@ void main() {
// Special argmax shader for N=1 classes. We don't need to worry about softmax // Special argmax shader for N=1 classes. We don't need to worry about softmax
// activation (it is assumed softmax requires N > 1 classes), but this should // activation (it is assumed softmax requires N > 1 classes), but this should
// occur after SIGMOID activation if specified. Instead of a true argmax, we // occur after SIGMOID activation if specified. Instead of a true argmax, we
// simply use 0.5 as the cutoff, assigning 1 (foreground) or 0 (background) // simply use 0.5 as the cutoff, assigning 0 (foreground) or 255 (background)
// based on whether the confidence value reaches this cutoff or not, // based on whether the confidence value reaches this cutoff or not,
// respectively. // respectively.
static constexpr char kArgmaxOneClassShader[] = R"( static constexpr char kArgmaxOneClassShader[] = R"(
@ -199,12 +199,12 @@ uniform sampler2D input_texture;
void main() { void main() {
float input_val = texture2D(input_texture, sample_coordinate).x; float input_val = texture2D(input_texture, sample_coordinate).x;
// Category is just value rounded to nearest integer; then we map to either // Category is just value rounded to nearest integer; then we map to either
// 0 or 1/255 accordingly. If the input has been activated properly, then the // 0 or 1 accordingly. If the input has been activated properly, then the
// values should always be in the range [0, 1]. But just in case it hasn't, to // values should always be in the range [0, 1]. But just in case it hasn't, to
// avoid category overflow issues when the activation function is not properly // avoid category overflow issues when the activation function is not properly
// chosen, we add an extra clamp here, as performance hit is minimal. // chosen, we add an extra clamp here, as performance hit is minimal.
float category = clamp(floor(input_val + 0.5), 0.0, 1.0); float category = clamp(floor(1.5 - input_val), 0.0, 1.0);
gl_FragColor = vec4(category / 255.0, 0.0, 0.0, 1.0); gl_FragColor = vec4(category, 0.0, 0.0, 1.0);
})"; })";
// Softmax is in 3 steps: // Softmax is in 3 steps:

View File

@ -61,6 +61,8 @@ using ::mediapipe::tasks::vision::GetImageLikeTensorShape;
using ::mediapipe::tasks::vision::Shape; using ::mediapipe::tasks::vision::Shape;
using ::mediapipe::tasks::vision::image_segmenter::proto::SegmenterOptions; using ::mediapipe::tasks::vision::image_segmenter::proto::SegmenterOptions;
constexpr uint8_t kUnLabeledPixelValue = 255;
void StableSoftmax(absl::Span<const float> values, void StableSoftmax(absl::Span<const float> values,
absl::Span<float> activated_values) { absl::Span<float> activated_values) {
float max_value = *std::max_element(values.begin(), values.end()); float max_value = *std::max_element(values.begin(), values.end());
@ -153,9 +155,11 @@ Image ProcessForCategoryMaskCpu(const Shape& input_shape,
} }
if (input_channels == 1) { if (input_channels == 1) {
// if the input tensor is a single mask, it is assumed to be a binary // if the input tensor is a single mask, it is assumed to be a binary
// foreground segmentation mask. For such a mask, we make foreground // foreground segmentation mask. For such a mask, instead of a true
// category 1, and background category 0. // argmax, we simply use 0.5 as the cutoff, assigning 0 (foreground) or
pixel = static_cast<uint8_t>(confidence_scores[0] > 0.5f); // 255 (background) based on whether the confidence value reaches this
// cutoff or not, respectively.
pixel = confidence_scores[0] > 0.5f ? 0 : kUnLabeledPixelValue;
} else { } else {
const int maximum_category_idx = const int maximum_category_idx =
std::max_element(confidence_scores.begin(), confidence_scores.end()) - std::max_element(confidence_scores.begin(), confidence_scores.end()) -

View File

@ -23,6 +23,7 @@ limitations under the License.
#include "absl/strings/str_format.h" #include "absl/strings/str_format.h"
#include "mediapipe/calculators/image/image_clone_calculator.pb.h" #include "mediapipe/calculators/image/image_clone_calculator.pb.h"
#include "mediapipe/calculators/image/image_transformation_calculator.pb.h" #include "mediapipe/calculators/image/image_transformation_calculator.pb.h"
#include "mediapipe/calculators/image/set_alpha_calculator.pb.h"
#include "mediapipe/calculators/tensor/tensor_converter_calculator.pb.h" #include "mediapipe/calculators/tensor/tensor_converter_calculator.pb.h"
#include "mediapipe/framework/api2/builder.h" #include "mediapipe/framework/api2/builder.h"
#include "mediapipe/framework/api2/port.h" #include "mediapipe/framework/api2/port.h"
@ -249,7 +250,8 @@ void ConfigureTensorConverterCalculator(
// the tflite model. // the tflite model.
absl::StatusOr<ImageAndTensorsOnDevice> ConvertImageToTensors( absl::StatusOr<ImageAndTensorsOnDevice> ConvertImageToTensors(
Source<Image> image_in, Source<NormalizedRect> norm_rect_in, bool use_gpu, Source<Image> image_in, Source<NormalizedRect> norm_rect_in, bool use_gpu,
const core::ModelResources& model_resources, Graph& graph) { bool is_hair_segmentation, const core::ModelResources& model_resources,
Graph& graph) {
ASSIGN_OR_RETURN(const tflite::Tensor* tflite_input_tensor, ASSIGN_OR_RETURN(const tflite::Tensor* tflite_input_tensor,
GetInputTensor(model_resources)); GetInputTensor(model_resources));
if (tflite_input_tensor->shape()->size() != 4) { if (tflite_input_tensor->shape()->size() != 4) {
@ -294,9 +296,17 @@ absl::StatusOr<ImageAndTensorsOnDevice> ConvertImageToTensors(
// Convert from Image to legacy ImageFrame or GpuBuffer. // Convert from Image to legacy ImageFrame or GpuBuffer.
auto& from_image = graph.AddNode("FromImageCalculator"); auto& from_image = graph.AddNode("FromImageCalculator");
image_on_device >> from_image.In(kImageTag); image_on_device >> from_image.In(kImageTag);
auto image_cpu_or_gpu = Source<api2::AnyType> image_cpu_or_gpu =
from_image.Out(use_gpu ? kImageGpuTag : kImageCpuTag); from_image.Out(use_gpu ? kImageGpuTag : kImageCpuTag);
if (is_hair_segmentation) {
auto& set_alpha = graph.AddNode("SetAlphaCalculator");
set_alpha.GetOptions<mediapipe::SetAlphaCalculatorOptions>()
.set_alpha_value(0);
image_cpu_or_gpu >> set_alpha.In(use_gpu ? kImageGpuTag : kImageTag);
image_cpu_or_gpu = set_alpha.Out(use_gpu ? kImageGpuTag : kImageTag);
}
// Resize the input image to the model input size. // Resize the input image to the model input size.
auto& image_transformation = graph.AddNode("ImageTransformationCalculator"); auto& image_transformation = graph.AddNode("ImageTransformationCalculator");
ConfigureImageTransformationCalculator( ConfigureImageTransformationCalculator(
@ -461,22 +471,41 @@ class ImageSegmenterGraph : public core::ModelTaskGraph {
bool use_gpu = bool use_gpu =
components::processors::DetermineImagePreprocessingGpuBackend( components::processors::DetermineImagePreprocessingGpuBackend(
task_options.base_options().acceleration()); task_options.base_options().acceleration());
ASSIGN_OR_RETURN(auto image_and_tensors,
ConvertImageToTensors(image_in, norm_rect_in, use_gpu,
model_resources, graph));
// Adds inference subgraph and connects its input stream to the output
// tensors produced by the ImageToTensorCalculator.
auto& inference = AddInference(
model_resources, task_options.base_options().acceleration(), graph);
image_and_tensors.tensors >> inference.In(kTensorsTag);
// Adds segmentation calculators for output streams. // Adds segmentation calculators for output streams. Add this calculator
// first to get the labels.
auto& tensor_to_images = auto& tensor_to_images =
graph.AddNode("mediapipe.tasks.TensorsToSegmentationCalculator"); graph.AddNode("mediapipe.tasks.TensorsToSegmentationCalculator");
RET_CHECK_OK(ConfigureTensorsToSegmentationCalculator( RET_CHECK_OK(ConfigureTensorsToSegmentationCalculator(
task_options, model_resources, task_options, model_resources,
&tensor_to_images &tensor_to_images
.GetOptions<TensorsToSegmentationCalculatorOptions>())); .GetOptions<TensorsToSegmentationCalculatorOptions>()));
const auto& tensor_to_images_options =
tensor_to_images.GetOptions<TensorsToSegmentationCalculatorOptions>();
// TODO: remove special logic for hair segmentation model.
// The alpha channel of hair segmentation model indicates the interested
// area. The model was designed for live stream mode, so that the mask of
// previous frame is used as the indicator for the next frame. For the first
// frame, it expects the alpha channel to be empty. To consolidate IMAGE,
// VIDEO and LIVE_STREAM mode in mediapipe tasks, here we forcely set the
// alpha channel to be empty if we find the model is the hair segmentation
// model.
bool is_hair_segmentation = false;
if (tensor_to_images_options.label_items_size() == 2 &&
tensor_to_images_options.label_items().at(1).name() == "hair") {
is_hair_segmentation = true;
}
ASSIGN_OR_RETURN(
auto image_and_tensors,
ConvertImageToTensors(image_in, norm_rect_in, use_gpu,
is_hair_segmentation, model_resources, graph));
// Adds inference subgraph and connects its input stream to the output
// tensors produced by the ImageToTensorCalculator.
auto& inference = AddInference(
model_resources, task_options.base_options().acceleration(), graph);
image_and_tensors.tensors >> inference.In(kTensorsTag);
inference.Out(kTensorsTag) >> tensor_to_images.In(kTensorsTag); inference.Out(kTensorsTag) >> tensor_to_images.In(kTensorsTag);
// Adds image property calculator for output size. // Adds image property calculator for output size.

View File

@ -30,6 +30,7 @@ limitations under the License.
#include "mediapipe/framework/port/opencv_imgcodecs_inc.h" #include "mediapipe/framework/port/opencv_imgcodecs_inc.h"
#include "mediapipe/framework/port/opencv_imgproc_inc.h" #include "mediapipe/framework/port/opencv_imgproc_inc.h"
#include "mediapipe/framework/port/status_matchers.h" #include "mediapipe/framework/port/status_matchers.h"
#include "mediapipe/framework/tool/test_util.h"
#include "mediapipe/tasks/cc/components/containers/rect.h" #include "mediapipe/tasks/cc/components/containers/rect.h"
#include "mediapipe/tasks/cc/core/base_options.h" #include "mediapipe/tasks/cc/core/base_options.h"
#include "mediapipe/tasks/cc/core/proto/base_options.pb.h" #include "mediapipe/tasks/cc/core/proto/base_options.pb.h"
@ -425,6 +426,28 @@ TEST_F(ImageModeTest, SucceedsSelfie144x256Segmentations) {
SimilarToFloatMask(expected_mask_float, kGoldenMaskSimilarity)); SimilarToFloatMask(expected_mask_float, kGoldenMaskSimilarity));
} }
TEST_F(ImageModeTest, SucceedsSelfieSegmentationSingleLabel) {
auto options = std::make_unique<ImageSegmenterOptions>();
options->base_options.model_asset_path =
JoinPath("./", kTestDataDirectory, kSelfieSegmentation);
MP_ASSERT_OK_AND_ASSIGN(std::unique_ptr<ImageSegmenter> segmenter,
ImageSegmenter::Create(std::move(options)));
ASSERT_EQ(segmenter->GetLabels().size(), 1);
EXPECT_EQ(segmenter->GetLabels()[0], "selfie");
MP_ASSERT_OK(segmenter->Close());
}
TEST_F(ImageModeTest, SucceedsSelfieSegmentationLandscapeSingleLabel) {
auto options = std::make_unique<ImageSegmenterOptions>();
options->base_options.model_asset_path =
JoinPath("./", kTestDataDirectory, kSelfieSegmentationLandscape);
MP_ASSERT_OK_AND_ASSIGN(std::unique_ptr<ImageSegmenter> segmenter,
ImageSegmenter::Create(std::move(options)));
ASSERT_EQ(segmenter->GetLabels().size(), 1);
EXPECT_EQ(segmenter->GetLabels()[0], "selfie");
MP_ASSERT_OK(segmenter->Close());
}
TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationConfidenceMask) { TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationConfidenceMask) {
Image image = Image image =
GetSRGBImage(JoinPath("./", kTestDataDirectory, "portrait.jpg")); GetSRGBImage(JoinPath("./", kTestDataDirectory, "portrait.jpg"));
@ -464,6 +487,9 @@ TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationCategoryMask) {
EXPECT_TRUE(result.category_mask.has_value()); EXPECT_TRUE(result.category_mask.has_value());
MP_ASSERT_OK(segmenter->Close()); MP_ASSERT_OK(segmenter->Close());
MP_EXPECT_OK(
SavePngTestOutput(*result.category_mask->GetImageFrameSharedPtr(),
"portrait_selfie_segmentation_expected_category_mask"));
cv::Mat selfie_mask = mediapipe::formats::MatView( cv::Mat selfie_mask = mediapipe::formats::MatView(
result.category_mask->GetImageFrameSharedPtr().get()); result.category_mask->GetImageFrameSharedPtr().get());
cv::Mat expected_mask = cv::imread( cv::Mat expected_mask = cv::imread(
@ -471,7 +497,7 @@ TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationCategoryMask) {
"portrait_selfie_segmentation_expected_category_mask.jpg"), "portrait_selfie_segmentation_expected_category_mask.jpg"),
cv::IMREAD_GRAYSCALE); cv::IMREAD_GRAYSCALE);
EXPECT_THAT(selfie_mask, EXPECT_THAT(selfie_mask,
SimilarToUint8Mask(expected_mask, kGoldenMaskSimilarity, 255)); SimilarToUint8Mask(expected_mask, kGoldenMaskSimilarity, 1));
} }
TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationLandscapeCategoryMask) { TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationLandscapeCategoryMask) {
@ -487,6 +513,9 @@ TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationLandscapeCategoryMask) {
EXPECT_TRUE(result.category_mask.has_value()); EXPECT_TRUE(result.category_mask.has_value());
MP_ASSERT_OK(segmenter->Close()); MP_ASSERT_OK(segmenter->Close());
MP_EXPECT_OK(SavePngTestOutput(
*result.category_mask->GetImageFrameSharedPtr(),
"portrait_selfie_segmentation_landscape_expected_category_mask"));
cv::Mat selfie_mask = mediapipe::formats::MatView( cv::Mat selfie_mask = mediapipe::formats::MatView(
result.category_mask->GetImageFrameSharedPtr().get()); result.category_mask->GetImageFrameSharedPtr().get());
cv::Mat expected_mask = cv::imread( cv::Mat expected_mask = cv::imread(
@ -495,7 +524,7 @@ TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationLandscapeCategoryMask) {
"portrait_selfie_segmentation_landscape_expected_category_mask.jpg"), "portrait_selfie_segmentation_landscape_expected_category_mask.jpg"),
cv::IMREAD_GRAYSCALE); cv::IMREAD_GRAYSCALE);
EXPECT_THAT(selfie_mask, EXPECT_THAT(selfie_mask,
SimilarToUint8Mask(expected_mask, kGoldenMaskSimilarity, 255)); SimilarToUint8Mask(expected_mask, kGoldenMaskSimilarity, 1));
} }
TEST_F(ImageModeTest, SucceedsHairSegmentation) { TEST_F(ImageModeTest, SucceedsHairSegmentation) {

View File

@ -129,9 +129,17 @@ absl::StatusOr<std::unique_ptr<ObjectDetector>> ObjectDetector::Create(
if (status_or_packets.value()[kImageOutStreamName].IsEmpty()) { if (status_or_packets.value()[kImageOutStreamName].IsEmpty()) {
return; return;
} }
Packet image_packet = status_or_packets.value()[kImageOutStreamName];
Packet detections_packet = Packet detections_packet =
status_or_packets.value()[kDetectionsOutStreamName]; status_or_packets.value()[kDetectionsOutStreamName];
Packet image_packet = status_or_packets.value()[kImageOutStreamName]; if (detections_packet.IsEmpty()) {
Packet empty_packet =
status_or_packets.value()[kDetectionsOutStreamName];
result_callback(
{ConvertToDetectionResult({})}, image_packet.Get<Image>(),
empty_packet.Timestamp().Value() / kMicroSecondsPerMilliSecond);
return;
}
result_callback(ConvertToDetectionResult( result_callback(ConvertToDetectionResult(
detections_packet.Get<std::vector<Detection>>()), detections_packet.Get<std::vector<Detection>>()),
image_packet.Get<Image>(), image_packet.Get<Image>(),
@ -165,6 +173,9 @@ absl::StatusOr<ObjectDetectorResult> ObjectDetector::Detect(
ProcessImageData( ProcessImageData(
{{kImageInStreamName, MakePacket<Image>(std::move(image))}, {{kImageInStreamName, MakePacket<Image>(std::move(image))},
{kNormRectName, MakePacket<NormalizedRect>(std::move(norm_rect))}})); {kNormRectName, MakePacket<NormalizedRect>(std::move(norm_rect))}}));
if (output_packets[kDetectionsOutStreamName].IsEmpty()) {
return {ConvertToDetectionResult({})};
}
return ConvertToDetectionResult( return ConvertToDetectionResult(
output_packets[kDetectionsOutStreamName].Get<std::vector<Detection>>()); output_packets[kDetectionsOutStreamName].Get<std::vector<Detection>>());
} }
@ -190,6 +201,9 @@ absl::StatusOr<ObjectDetectorResult> ObjectDetector::DetectForVideo(
{kNormRectName, {kNormRectName,
MakePacket<NormalizedRect>(std::move(norm_rect)) MakePacket<NormalizedRect>(std::move(norm_rect))
.At(Timestamp(timestamp_ms * kMicroSecondsPerMilliSecond))}})); .At(Timestamp(timestamp_ms * kMicroSecondsPerMilliSecond))}}));
if (output_packets[kDetectionsOutStreamName].IsEmpty()) {
return {ConvertToDetectionResult({})};
}
return ConvertToDetectionResult( return ConvertToDetectionResult(
output_packets[kDetectionsOutStreamName].Get<std::vector<Detection>>()); output_packets[kDetectionsOutStreamName].Get<std::vector<Detection>>());
} }

View File

@ -499,6 +499,22 @@ TEST_F(ImageModeTest, SucceedsEfficientDetNoNmsModel) {
})pb")})); })pb")}));
} }
TEST_F(ImageModeTest, SucceedsNoObjectDetected) {
MP_ASSERT_OK_AND_ASSIGN(Image image,
DecodeImageFromFile(JoinPath("./", kTestDataDirectory,
"cats_and_dogs.jpg")));
auto options = std::make_unique<ObjectDetectorOptions>();
options->max_results = 4;
options->score_threshold = 1.0f;
options->base_options.model_asset_path =
JoinPath("./", kTestDataDirectory, kEfficientDetWithoutNms);
MP_ASSERT_OK_AND_ASSIGN(std::unique_ptr<ObjectDetector> object_detector,
ObjectDetector::Create(std::move(options)));
MP_ASSERT_OK_AND_ASSIGN(auto results, object_detector->Detect(image));
MP_ASSERT_OK(object_detector->Close());
EXPECT_THAT(results.detections, testing::IsEmpty());
}
TEST_F(ImageModeTest, SucceedsWithoutImageResizing) { TEST_F(ImageModeTest, SucceedsWithoutImageResizing) {
MP_ASSERT_OK_AND_ASSIGN(Image image, DecodeImageFromFile(JoinPath( MP_ASSERT_OK_AND_ASSIGN(Image image, DecodeImageFromFile(JoinPath(
"./", kTestDataDirectory, "./", kTestDataDirectory,

View File

@ -63,8 +63,6 @@ constexpr char kNormLandmarksTag[] = "NORM_LANDMARKS";
constexpr char kNormLandmarksStreamName[] = "norm_landmarks"; constexpr char kNormLandmarksStreamName[] = "norm_landmarks";
constexpr char kPoseWorldLandmarksTag[] = "WORLD_LANDMARKS"; constexpr char kPoseWorldLandmarksTag[] = "WORLD_LANDMARKS";
constexpr char kPoseWorldLandmarksStreamName[] = "world_landmarks"; constexpr char kPoseWorldLandmarksStreamName[] = "world_landmarks";
constexpr char kPoseAuxiliaryLandmarksTag[] = "AUXILIARY_LANDMARKS";
constexpr char kPoseAuxiliaryLandmarksStreamName[] = "auxiliary_landmarks";
constexpr int kMicroSecondsPerMilliSecond = 1000; constexpr int kMicroSecondsPerMilliSecond = 1000;
// Creates a MediaPipe graph config that contains a subgraph node of // Creates a MediaPipe graph config that contains a subgraph node of
@ -83,9 +81,6 @@ CalculatorGraphConfig CreateGraphConfig(
graph.Out(kNormLandmarksTag); graph.Out(kNormLandmarksTag);
subgraph.Out(kPoseWorldLandmarksTag).SetName(kPoseWorldLandmarksStreamName) >> subgraph.Out(kPoseWorldLandmarksTag).SetName(kPoseWorldLandmarksStreamName) >>
graph.Out(kPoseWorldLandmarksTag); graph.Out(kPoseWorldLandmarksTag);
subgraph.Out(kPoseAuxiliaryLandmarksTag)
.SetName(kPoseAuxiliaryLandmarksStreamName) >>
graph.Out(kPoseAuxiliaryLandmarksTag);
subgraph.Out(kImageTag).SetName(kImageOutStreamName) >> graph.Out(kImageTag); subgraph.Out(kImageTag).SetName(kImageOutStreamName) >> graph.Out(kImageTag);
if (output_segmentation_masks) { if (output_segmentation_masks) {
subgraph.Out(kSegmentationMaskTag).SetName(kSegmentationMaskStreamName) >> subgraph.Out(kSegmentationMaskTag).SetName(kSegmentationMaskStreamName) >>
@ -163,8 +158,6 @@ absl::StatusOr<std::unique_ptr<PoseLandmarker>> PoseLandmarker::Create(
status_or_packets.value()[kNormLandmarksStreamName]; status_or_packets.value()[kNormLandmarksStreamName];
Packet pose_world_landmarks_packet = Packet pose_world_landmarks_packet =
status_or_packets.value()[kPoseWorldLandmarksStreamName]; status_or_packets.value()[kPoseWorldLandmarksStreamName];
Packet pose_auxiliary_landmarks_packet =
status_or_packets.value()[kPoseAuxiliaryLandmarksStreamName];
std::optional<std::vector<Image>> segmentation_mask = std::nullopt; std::optional<std::vector<Image>> segmentation_mask = std::nullopt;
if (output_segmentation_masks) { if (output_segmentation_masks) {
segmentation_mask = segmentation_mask_packet.Get<std::vector<Image>>(); segmentation_mask = segmentation_mask_packet.Get<std::vector<Image>>();
@ -175,9 +168,7 @@ absl::StatusOr<std::unique_ptr<PoseLandmarker>> PoseLandmarker::Create(
/* pose_landmarks= */ /* pose_landmarks= */
pose_landmarks_packet.Get<std::vector<NormalizedLandmarkList>>(), pose_landmarks_packet.Get<std::vector<NormalizedLandmarkList>>(),
/* pose_world_landmarks= */ /* pose_world_landmarks= */
pose_world_landmarks_packet.Get<std::vector<LandmarkList>>(), pose_world_landmarks_packet.Get<std::vector<LandmarkList>>()),
pose_auxiliary_landmarks_packet
.Get<std::vector<NormalizedLandmarkList>>()),
image_packet.Get<Image>(), image_packet.Get<Image>(),
pose_landmarks_packet.Timestamp().Value() / pose_landmarks_packet.Timestamp().Value() /
kMicroSecondsPerMilliSecond); kMicroSecondsPerMilliSecond);
@ -234,10 +225,7 @@ absl::StatusOr<PoseLandmarkerResult> PoseLandmarker::Detect(
.Get<std::vector<mediapipe::NormalizedLandmarkList>>(), .Get<std::vector<mediapipe::NormalizedLandmarkList>>(),
/* pose_world_landmarks */ /* pose_world_landmarks */
output_packets[kPoseWorldLandmarksStreamName] output_packets[kPoseWorldLandmarksStreamName]
.Get<std::vector<mediapipe::LandmarkList>>(), .Get<std::vector<mediapipe::LandmarkList>>());
/*pose_auxiliary_landmarks= */
output_packets[kPoseAuxiliaryLandmarksStreamName]
.Get<std::vector<mediapipe::NormalizedLandmarkList>>());
} }
absl::StatusOr<PoseLandmarkerResult> PoseLandmarker::DetectForVideo( absl::StatusOr<PoseLandmarkerResult> PoseLandmarker::DetectForVideo(
@ -277,10 +265,7 @@ absl::StatusOr<PoseLandmarkerResult> PoseLandmarker::DetectForVideo(
.Get<std::vector<mediapipe::NormalizedLandmarkList>>(), .Get<std::vector<mediapipe::NormalizedLandmarkList>>(),
/* pose_world_landmarks */ /* pose_world_landmarks */
output_packets[kPoseWorldLandmarksStreamName] output_packets[kPoseWorldLandmarksStreamName]
.Get<std::vector<mediapipe::LandmarkList>>(), .Get<std::vector<mediapipe::LandmarkList>>());
/* pose_auxiliary_landmarks= */
output_packets[kPoseAuxiliaryLandmarksStreamName]
.Get<std::vector<mediapipe::NormalizedLandmarkList>>());
} }
absl::Status PoseLandmarker::DetectAsync( absl::Status PoseLandmarker::DetectAsync(

View File

@ -27,15 +27,12 @@ namespace pose_landmarker {
PoseLandmarkerResult ConvertToPoseLandmarkerResult( PoseLandmarkerResult ConvertToPoseLandmarkerResult(
std::optional<std::vector<mediapipe::Image>> segmentation_masks, std::optional<std::vector<mediapipe::Image>> segmentation_masks,
const std::vector<mediapipe::NormalizedLandmarkList>& pose_landmarks_proto, const std::vector<mediapipe::NormalizedLandmarkList>& pose_landmarks_proto,
const std::vector<mediapipe::LandmarkList>& pose_world_landmarks_proto, const std::vector<mediapipe::LandmarkList>& pose_world_landmarks_proto) {
const std::vector<mediapipe::NormalizedLandmarkList>&
pose_auxiliary_landmarks_proto) {
PoseLandmarkerResult result; PoseLandmarkerResult result;
result.segmentation_masks = segmentation_masks; result.segmentation_masks = segmentation_masks;
result.pose_landmarks.resize(pose_landmarks_proto.size()); result.pose_landmarks.resize(pose_landmarks_proto.size());
result.pose_world_landmarks.resize(pose_world_landmarks_proto.size()); result.pose_world_landmarks.resize(pose_world_landmarks_proto.size());
result.pose_auxiliary_landmarks.resize(pose_auxiliary_landmarks_proto.size());
std::transform(pose_landmarks_proto.begin(), pose_landmarks_proto.end(), std::transform(pose_landmarks_proto.begin(), pose_landmarks_proto.end(),
result.pose_landmarks.begin(), result.pose_landmarks.begin(),
components::containers::ConvertToNormalizedLandmarks); components::containers::ConvertToNormalizedLandmarks);
@ -43,10 +40,6 @@ PoseLandmarkerResult ConvertToPoseLandmarkerResult(
pose_world_landmarks_proto.end(), pose_world_landmarks_proto.end(),
result.pose_world_landmarks.begin(), result.pose_world_landmarks.begin(),
components::containers::ConvertToLandmarks); components::containers::ConvertToLandmarks);
std::transform(pose_auxiliary_landmarks_proto.begin(),
pose_auxiliary_landmarks_proto.end(),
result.pose_auxiliary_landmarks.begin(),
components::containers::ConvertToNormalizedLandmarks);
return result; return result;
} }

View File

@ -37,17 +37,12 @@ struct PoseLandmarkerResult {
std::vector<components::containers::NormalizedLandmarks> pose_landmarks; std::vector<components::containers::NormalizedLandmarks> pose_landmarks;
// Detected pose landmarks in world coordinates. // Detected pose landmarks in world coordinates.
std::vector<components::containers::Landmarks> pose_world_landmarks; std::vector<components::containers::Landmarks> pose_world_landmarks;
// Detected auxiliary landmarks, used for deriving ROI for next frame.
std::vector<components::containers::NormalizedLandmarks>
pose_auxiliary_landmarks;
}; };
PoseLandmarkerResult ConvertToPoseLandmarkerResult( PoseLandmarkerResult ConvertToPoseLandmarkerResult(
std::optional<std::vector<mediapipe::Image>> segmentation_mask, std::optional<std::vector<mediapipe::Image>> segmentation_mask,
const std::vector<mediapipe::NormalizedLandmarkList>& pose_landmarks_proto, const std::vector<mediapipe::NormalizedLandmarkList>& pose_landmarks_proto,
const std::vector<mediapipe::LandmarkList>& pose_world_landmarks_proto, const std::vector<mediapipe::LandmarkList>& pose_world_landmarks_proto);
const std::vector<mediapipe::NormalizedLandmarkList>&
pose_auxiliary_landmarks_proto);
} // namespace pose_landmarker } // namespace pose_landmarker
} // namespace vision } // namespace vision

View File

@ -47,13 +47,6 @@ TEST(ConvertFromProto, Succeeds) {
landmark_proto.set_y(5.2); landmark_proto.set_y(5.2);
landmark_proto.set_z(4.3); landmark_proto.set_z(4.3);
mediapipe::NormalizedLandmarkList auxiliary_landmark_list_proto;
mediapipe::NormalizedLandmark& auxiliary_landmark_proto =
*auxiliary_landmark_list_proto.add_landmark();
auxiliary_landmark_proto.set_x(0.5);
auxiliary_landmark_proto.set_y(0.5);
auxiliary_landmark_proto.set_z(0.5);
std::vector<Image> segmentation_masks_lists = {segmentation_mask}; std::vector<Image> segmentation_masks_lists = {segmentation_mask};
std::vector<mediapipe::NormalizedLandmarkList> normalized_landmarks_lists = { std::vector<mediapipe::NormalizedLandmarkList> normalized_landmarks_lists = {
@ -62,12 +55,9 @@ TEST(ConvertFromProto, Succeeds) {
std::vector<mediapipe::LandmarkList> world_landmarks_lists = { std::vector<mediapipe::LandmarkList> world_landmarks_lists = {
world_landmark_list_proto}; world_landmark_list_proto};
std::vector<mediapipe::NormalizedLandmarkList> auxiliary_landmarks_lists = {
auxiliary_landmark_list_proto};
PoseLandmarkerResult pose_landmarker_result = ConvertToPoseLandmarkerResult( PoseLandmarkerResult pose_landmarker_result = ConvertToPoseLandmarkerResult(
segmentation_masks_lists, normalized_landmarks_lists, segmentation_masks_lists, normalized_landmarks_lists,
world_landmarks_lists, auxiliary_landmarks_lists); world_landmarks_lists);
EXPECT_EQ(pose_landmarker_result.pose_landmarks.size(), 1); EXPECT_EQ(pose_landmarker_result.pose_landmarks.size(), 1);
EXPECT_EQ(pose_landmarker_result.pose_landmarks[0].landmarks.size(), 1); EXPECT_EQ(pose_landmarker_result.pose_landmarks[0].landmarks.size(), 1);
@ -82,14 +72,6 @@ TEST(ConvertFromProto, Succeeds) {
testing::FieldsAre(testing::FloatEq(3.1), testing::FloatEq(5.2), testing::FieldsAre(testing::FloatEq(3.1), testing::FloatEq(5.2),
testing::FloatEq(4.3), std::nullopt, testing::FloatEq(4.3), std::nullopt,
std::nullopt, std::nullopt)); std::nullopt, std::nullopt));
EXPECT_EQ(pose_landmarker_result.pose_auxiliary_landmarks.size(), 1);
EXPECT_EQ(pose_landmarker_result.pose_auxiliary_landmarks[0].landmarks.size(),
1);
EXPECT_THAT(pose_landmarker_result.pose_auxiliary_landmarks[0].landmarks[0],
testing::FieldsAre(testing::FloatEq(0.5), testing::FloatEq(0.5),
testing::FloatEq(0.5), std::nullopt,
std::nullopt, std::nullopt));
} }
} // namespace pose_landmarker } // namespace pose_landmarker

View File

@ -24,7 +24,7 @@
return [NSString stringWithCString:text.c_str() encoding:[NSString defaultCStringEncoding]]; return [NSString stringWithCString:text.c_str() encoding:[NSString defaultCStringEncoding]];
} }
+ (NSString *)uuidString{ + (NSString *)uuidString {
return [[NSUUID UUID] UUIDString]; return [[NSUUID UUID] UUIDString];
} }

View File

@ -28,7 +28,12 @@
return self; return self;
} }
// TODO: Implement hash - (NSUInteger)hash {
NSUInteger nonNullPropertiesHash =
@(self.location.x).hash ^ @(self.location.y).hash ^ @(self.score).hash;
return self.label ? nonNullPropertiesHash ^ self.label.hash : nonNullPropertiesHash;
}
- (BOOL)isEqual:(nullable id)object { - (BOOL)isEqual:(nullable id)object {
if (!object) { if (!object) {

View File

@ -452,7 +452,8 @@ static NSString *const kLiveStreamTestsDictExpectationKey = @"expectation";
[self [self
assertCreateImageClassifierWithOptions:options assertCreateImageClassifierWithOptions:options
failsWithExpectedError: failsWithExpectedError:
[NSError errorWithDomain:kExpectedErrorDomain [NSError
errorWithDomain:kExpectedErrorDomain
code:MPPTasksErrorCodeInvalidArgumentError code:MPPTasksErrorCodeInvalidArgumentError
userInfo:@{ userInfo:@{
NSLocalizedDescriptionKey : NSLocalizedDescriptionKey :
@ -469,14 +470,14 @@ static NSString *const kLiveStreamTestsDictExpectationKey = @"expectation";
[self assertCreateImageClassifierWithOptions:options [self assertCreateImageClassifierWithOptions:options
failsWithExpectedError: failsWithExpectedError:
[NSError [NSError errorWithDomain:kExpectedErrorDomain
errorWithDomain:kExpectedErrorDomain
code:MPPTasksErrorCodeInvalidArgumentError code:MPPTasksErrorCodeInvalidArgumentError
userInfo:@{ userInfo:@{
NSLocalizedDescriptionKey : NSLocalizedDescriptionKey :
@"The vision task is in live stream mode. An object " @"The vision task is in live stream mode. An "
@"must be set as the delegate of the task in its " @"object must be set as the delegate of the task "
@"options to ensure asynchronous delivery of results." @"in its options to ensure asynchronous delivery "
@"of results."
}]]; }]];
} }

View File

@ -25,6 +25,8 @@ static NSDictionary *const kCatsAndDogsRotatedImage =
static NSString *const kExpectedErrorDomain = @"com.google.mediapipe.tasks"; static NSString *const kExpectedErrorDomain = @"com.google.mediapipe.tasks";
static const float pixelDifferenceTolerance = 10.0f; static const float pixelDifferenceTolerance = 10.0f;
static const float scoreDifferenceTolerance = 0.02f; static const float scoreDifferenceTolerance = 0.02f;
static NSString *const kLiveStreamTestsDictObjectDetectorKey = @"object_detector";
static NSString *const kLiveStreamTestsDictExpectationKey = @"expectation";
#define AssertEqualErrors(error, expectedError) \ #define AssertEqualErrors(error, expectedError) \
XCTAssertNotNil(error); \ XCTAssertNotNil(error); \
@ -58,7 +60,10 @@ static const float scoreDifferenceTolerance = 0.02f;
XCTAssertEqualWithAccuracy(boundingBox.size.height, expectedBoundingBox.size.height, \ XCTAssertEqualWithAccuracy(boundingBox.size.height, expectedBoundingBox.size.height, \
pixelDifferenceTolerance, @"index i = %d", idx); pixelDifferenceTolerance, @"index i = %d", idx);
@interface MPPObjectDetectorTests : XCTestCase @interface MPPObjectDetectorTests : XCTestCase <MPPObjectDetectorLiveStreamDelegate> {
NSDictionary *liveStreamSucceedsTestDict;
NSDictionary *outOfOrderTimestampTestDict;
}
@end @end
@implementation MPPObjectDetectorTests @implementation MPPObjectDetectorTests
@ -446,31 +451,28 @@ static const float scoreDifferenceTolerance = 0.02f;
#pragma mark Running Mode Tests #pragma mark Running Mode Tests
- (void)testCreateObjectDetectorFailsWithResultListenerInNonLiveStreamMode { - (void)testCreateObjectDetectorFailsWithDelegateInNonLiveStreamMode {
MPPRunningMode runningModesToTest[] = {MPPRunningModeImage, MPPRunningModeVideo}; MPPRunningMode runningModesToTest[] = {MPPRunningModeImage, MPPRunningModeVideo};
for (int i = 0; i < sizeof(runningModesToTest) / sizeof(runningModesToTest[0]); i++) { for (int i = 0; i < sizeof(runningModesToTest) / sizeof(runningModesToTest[0]); i++) {
MPPObjectDetectorOptions *options = [self objectDetectorOptionsWithModelName:kModelName]; MPPObjectDetectorOptions *options = [self objectDetectorOptionsWithModelName:kModelName];
options.runningMode = runningModesToTest[i]; options.runningMode = runningModesToTest[i];
options.completion = options.objectDetectorLiveStreamDelegate = self;
^(MPPObjectDetectionResult *result, NSInteger timestampInMilliseconds, NSError *error) {
};
[self [self
assertCreateObjectDetectorWithOptions:options assertCreateObjectDetectorWithOptions:options
failsWithExpectedError: failsWithExpectedError:
[NSError [NSError errorWithDomain:kExpectedErrorDomain
errorWithDomain:kExpectedErrorDomain
code:MPPTasksErrorCodeInvalidArgumentError code:MPPTasksErrorCodeInvalidArgumentError
userInfo:@{ userInfo:@{
NSLocalizedDescriptionKey : NSLocalizedDescriptionKey :
@"The vision task is in image or video mode, a " @"The vision task is in image or video mode. The "
@"user-defined result callback should not be provided." @"delegate must not be set in the task's options."
}]]; }]];
} }
} }
- (void)testCreateObjectDetectorFailsWithMissingResultListenerInLiveStreamMode { - (void)testCreateObjectDetectorFailsWithMissingDelegateInLiveStreamMode {
MPPObjectDetectorOptions *options = [self objectDetectorOptionsWithModelName:kModelName]; MPPObjectDetectorOptions *options = [self objectDetectorOptionsWithModelName:kModelName];
options.runningMode = MPPRunningModeLiveStream; options.runningMode = MPPRunningModeLiveStream;
@ -481,8 +483,10 @@ static const float scoreDifferenceTolerance = 0.02f;
code:MPPTasksErrorCodeInvalidArgumentError code:MPPTasksErrorCodeInvalidArgumentError
userInfo:@{ userInfo:@{
NSLocalizedDescriptionKey : NSLocalizedDescriptionKey :
@"The vision task is in live stream mode, a " @"The vision task is in live stream mode. An "
@"user-defined result callback must be provided." @"object must be set as the delegate of the task "
@"in its options to ensure asynchronous delivery "
@"of results."
}]]; }]];
} }
@ -563,10 +567,7 @@ static const float scoreDifferenceTolerance = 0.02f;
MPPObjectDetectorOptions *options = [self objectDetectorOptionsWithModelName:kModelName]; MPPObjectDetectorOptions *options = [self objectDetectorOptionsWithModelName:kModelName];
options.runningMode = MPPRunningModeLiveStream; options.runningMode = MPPRunningModeLiveStream;
options.completion = options.objectDetectorLiveStreamDelegate = self;
^(MPPObjectDetectionResult *result, NSInteger timestampInMilliseconds, NSError *error) {
};
MPPObjectDetector *objectDetector = [self objectDetectorWithOptionsSucceeds:options]; MPPObjectDetector *objectDetector = [self objectDetectorWithOptionsSucceeds:options];
@ -631,23 +632,17 @@ static const float scoreDifferenceTolerance = 0.02f;
options.maxResults = maxResults; options.maxResults = maxResults;
options.runningMode = MPPRunningModeLiveStream; options.runningMode = MPPRunningModeLiveStream;
options.objectDetectorLiveStreamDelegate = self;
XCTestExpectation *expectation = [[XCTestExpectation alloc] XCTestExpectation *expectation = [[XCTestExpectation alloc]
initWithDescription:@"detectWithOutOfOrderTimestampsAndLiveStream"]; initWithDescription:@"detectWithOutOfOrderTimestampsAndLiveStream"];
expectation.expectedFulfillmentCount = 1; expectation.expectedFulfillmentCount = 1;
options.completion =
^(MPPObjectDetectionResult *result, NSInteger timestampInMilliseconds, NSError *error) {
[self assertObjectDetectionResult:result
isEqualToExpectedResult:
[MPPObjectDetectorTests
expectedDetectionResultForCatsAndDogsImageWithTimestampInMilliseconds:
timestampInMilliseconds]
expectedDetectionsCount:maxResults];
[expectation fulfill];
};
MPPObjectDetector *objectDetector = [self objectDetectorWithOptionsSucceeds:options]; MPPObjectDetector *objectDetector = [self objectDetectorWithOptionsSucceeds:options];
liveStreamSucceedsTestDict = @{
kLiveStreamTestsDictObjectDetectorKey : objectDetector,
kLiveStreamTestsDictExpectationKey : expectation
};
MPPImage *image = [self imageWithFileInfo:kCatsAndDogsImage]; MPPImage *image = [self imageWithFileInfo:kCatsAndDogsImage];
@ -695,19 +690,15 @@ static const float scoreDifferenceTolerance = 0.02f;
expectation.expectedFulfillmentCount = iterationCount + 1; expectation.expectedFulfillmentCount = iterationCount + 1;
expectation.inverted = YES; expectation.inverted = YES;
options.completion = options.objectDetectorLiveStreamDelegate = self;
^(MPPObjectDetectionResult *result, NSInteger timestampInMilliseconds, NSError *error) {
[self assertObjectDetectionResult:result
isEqualToExpectedResult:
[MPPObjectDetectorTests
expectedDetectionResultForCatsAndDogsImageWithTimestampInMilliseconds:
timestampInMilliseconds]
expectedDetectionsCount:maxResults];
[expectation fulfill];
};
MPPObjectDetector *objectDetector = [self objectDetectorWithOptionsSucceeds:options]; MPPObjectDetector *objectDetector = [self objectDetectorWithOptionsSucceeds:options];
liveStreamSucceedsTestDict = @{
kLiveStreamTestsDictObjectDetectorKey : objectDetector,
kLiveStreamTestsDictExpectationKey : expectation
};
// TODO: Mimic initialization from CMSampleBuffer as live stream mode is most likely to be used // TODO: Mimic initialization from CMSampleBuffer as live stream mode is most likely to be used
// with the iOS camera. AVCaptureVideoDataOutput sample buffer delegates provide frames of type // with the iOS camera. AVCaptureVideoDataOutput sample buffer delegates provide frames of type
// `CMSampleBuffer`. // `CMSampleBuffer`.
@ -721,4 +712,24 @@ static const float scoreDifferenceTolerance = 0.02f;
[self waitForExpectations:@[ expectation ] timeout:timeout]; [self waitForExpectations:@[ expectation ] timeout:timeout];
} }
#pragma mark MPPObjectDetectorLiveStreamDelegate Methods
- (void)objectDetector:(MPPObjectDetector *)objectDetector
didFinishDetectionWithResult:(MPPObjectDetectionResult *)objectDetectionResult
timestampInMilliseconds:(NSInteger)timestampInMilliseconds
error:(NSError *)error {
NSInteger maxResults = 4;
[self assertObjectDetectionResult:objectDetectionResult
isEqualToExpectedResult:
[MPPObjectDetectorTests
expectedDetectionResultForCatsAndDogsImageWithTimestampInMilliseconds:
timestampInMilliseconds]
expectedDetectionsCount:maxResults];
if (objectDetector == outOfOrderTimestampTestDict[kLiveStreamTestsDictObjectDetectorKey]) {
[outOfOrderTimestampTestDict[kLiveStreamTestsDictExpectationKey] fulfill];
} else if (objectDetector == liveStreamSucceedsTestDict[kLiveStreamTestsDictObjectDetectorKey]) {
[liveStreamSucceedsTestDict[kLiveStreamTestsDictExpectationKey] fulfill];
}
}
@end @end

View File

@ -63,5 +63,10 @@ objc_library(
"//third_party/apple_frameworks:UIKit", "//third_party/apple_frameworks:UIKit",
"@com_google_absl//absl/status:statusor", "@com_google_absl//absl/status:statusor",
"@ios_opencv//:OpencvFramework", "@ios_opencv//:OpencvFramework",
], ] + select({
"@//third_party:opencv_ios_sim_arm64_source_build": ["@ios_opencv_source//:opencv_xcframework"],
"@//third_party:opencv_ios_sim_fat_source_build": ["@ios_opencv_source//:opencv_xcframework"],
"@//third_party:opencv_ios_arm64_source_build": ["@ios_opencv_source//:opencv_xcframework"],
"//conditions:default": [],
}),
) )

View File

@ -96,6 +96,15 @@ NS_SWIFT_NAME(ObjectDetector)
* `MPPImage`. Only use this method when the `MPPObjectDetector` is created with * `MPPImage`. Only use this method when the `MPPObjectDetector` is created with
* `MPPRunningModeImage`. * `MPPRunningModeImage`.
* *
* This method supports classification of RGBA images. If your `MPPImage` has a source type of
* `MPPImageSourceTypePixelBuffer` or `MPPImageSourceTypeSampleBuffer`, the underlying pixel buffer
* must have one of the following pixel format types:
* 1. kCVPixelFormatType_32BGRA
* 2. kCVPixelFormatType_32RGBA
*
* If your `MPPImage` has a source type of `MPPImageSourceTypeImage` ensure that the color space is
* RGB with an Alpha channel.
*
* @param image The `MPPImage` on which object detection is to be performed. * @param image The `MPPImage` on which object detection is to be performed.
* @param error An optional error parameter populated when there is an error in performing object * @param error An optional error parameter populated when there is an error in performing object
* detection on the input image. * detection on the input image.
@ -115,6 +124,15 @@ NS_SWIFT_NAME(ObjectDetector)
* the provided `MPPImage`. Only use this method when the `MPPObjectDetector` is created with * the provided `MPPImage`. Only use this method when the `MPPObjectDetector` is created with
* `MPPRunningModeVideo`. * `MPPRunningModeVideo`.
* *
* This method supports classification of RGBA images. If your `MPPImage` has a source type of
* `MPPImageSourceTypePixelBuffer` or `MPPImageSourceTypeSampleBuffer`, the underlying pixel buffer
* must have one of the following pixel format types:
* 1. kCVPixelFormatType_32BGRA
* 2. kCVPixelFormatType_32RGBA
*
* If your `MPPImage` has a source type of `MPPImageSourceTypeImage` ensure that the color space is
* RGB with an Alpha channel.
*
* @param image The `MPPImage` on which object detection is to be performed. * @param image The `MPPImage` on which object detection is to be performed.
* @param timestampInMilliseconds The video frame's timestamp (in milliseconds). The input * @param timestampInMilliseconds The video frame's timestamp (in milliseconds). The input
* timestamps must be monotonically increasing. * timestamps must be monotonically increasing.
@ -135,12 +153,28 @@ NS_SWIFT_NAME(ObjectDetector)
* Sends live stream image data of type `MPPImage` to perform object detection using the whole * Sends live stream image data of type `MPPImage` to perform object detection using the whole
* image as region of interest. Rotation will be applied according to the `orientation` property of * image as region of interest. Rotation will be applied according to the `orientation` property of
* the provided `MPPImage`. Only use this method when the `MPPObjectDetector` is created with * the provided `MPPImage`. Only use this method when the `MPPObjectDetector` is created with
* `MPPRunningModeLiveStream`. Results are provided asynchronously via the `completion` callback * `MPPRunningModeLiveStream`.
* provided in the `MPPObjectDetectorOptions`. *
* The object which needs to be continuously notified of the available results of object
* detection must confirm to `MPPObjectDetectorLiveStreamDelegate` protocol and implement the
* `objectDetector:didFinishDetectionWithResult:timestampInMilliseconds:error:` delegate method.
* *
* It's required to provide a timestamp (in milliseconds) to indicate when the input image is sent * It's required to provide a timestamp (in milliseconds) to indicate when the input image is sent
* to the object detector. The input timestamps must be monotonically increasing. * to the object detector. The input timestamps must be monotonically increasing.
* *
* This method supports classification of RGBA images. If your `MPPImage` has a source type of
* `MPPImageSourceTypePixelBuffer` or `MPPImageSourceTypeSampleBuffer`, the underlying pixel buffer
* must have one of the following pixel format types:
* 1. kCVPixelFormatType_32BGRA
* 2. kCVPixelFormatType_32RGBA
*
* If the input `MPPImage` has a source type of `MPPImageSourceTypeImage` ensure that the color
* space is RGB with an Alpha channel.
*
* If this method is used for classifying live camera frames using `AVFoundation`, ensure that you
* request `AVCaptureVideoDataOutput` to output frames in `kCMPixelFormat_32RGBA` using its
* `videoSettings` property.
*
* @param image A live stream image data of type `MPPImage` on which object detection is to be * @param image A live stream image data of type `MPPImage` on which object detection is to be
* performed. * performed.
* @param timestampInMilliseconds The timestamp (in milliseconds) which indicates when the input * @param timestampInMilliseconds The timestamp (in milliseconds) which indicates when the input

View File

@ -37,8 +37,8 @@ static NSString *const kImageOutStreamName = @"image_out";
static NSString *const kImageTag = @"IMAGE"; static NSString *const kImageTag = @"IMAGE";
static NSString *const kNormRectStreamName = @"norm_rect_in"; static NSString *const kNormRectStreamName = @"norm_rect_in";
static NSString *const kNormRectTag = @"NORM_RECT"; static NSString *const kNormRectTag = @"NORM_RECT";
static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorGraph"; static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorGraph";
static NSString *const kTaskName = @"objectDetector";
#define InputPacketMap(imagePacket, normalizedRectPacket) \ #define InputPacketMap(imagePacket, normalizedRectPacket) \
{ \ { \
@ -51,6 +51,7 @@ static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorG
/** iOS Vision Task Runner */ /** iOS Vision Task Runner */
MPPVisionTaskRunner *_visionTaskRunner; MPPVisionTaskRunner *_visionTaskRunner;
} }
@property(nonatomic, weak) id<MPPObjectDetectorLiveStreamDelegate> objectDetectorLiveStreamDelegate;
@end @end
@implementation MPPObjectDetector @implementation MPPObjectDetector
@ -78,11 +79,37 @@ static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorG
PacketsCallback packetsCallback = nullptr; PacketsCallback packetsCallback = nullptr;
if (options.completion) { if (options.objectDetectorLiveStreamDelegate) {
_objectDetectorLiveStreamDelegate = options.objectDetectorLiveStreamDelegate;
// Capturing `self` as weak in order to avoid `self` being kept in memory
// and cause a retain cycle, after self is set to `nil`.
MPPObjectDetector *__weak weakSelf = self;
// Create a private serial dispatch queue in which the delegate method will be called
// asynchronously. This is to ensure that if the client performs a long running operation in
// the delegate method, the queue on which the C++ callbacks is invoked is not blocked and is
// freed up to continue with its operations.
dispatch_queue_t callbackQueue = dispatch_queue_create(
[MPPVisionTaskRunner uniqueDispatchQueueNameWithSuffix:kTaskName], NULL);
packetsCallback = [=](absl::StatusOr<PacketMap> statusOrPackets) { packetsCallback = [=](absl::StatusOr<PacketMap> statusOrPackets) {
if (!weakSelf) {
return;
}
if (![weakSelf.objectDetectorLiveStreamDelegate
respondsToSelector:@selector
(objectDetector:didFinishDetectionWithResult:timestampInMilliseconds:error:)]) {
return;
}
NSError *callbackError = nil; NSError *callbackError = nil;
if (![MPPCommonUtils checkCppError:statusOrPackets.status() toError:&callbackError]) { if (![MPPCommonUtils checkCppError:statusOrPackets.status() toError:&callbackError]) {
options.completion(nil, Timestamp::Unset().Value(), callbackError); dispatch_async(callbackQueue, ^{
[weakSelf.objectDetectorLiveStreamDelegate objectDetector:weakSelf
didFinishDetectionWithResult:nil
timestampInMilliseconds:Timestamp::Unset().Value()
error:callbackError];
});
return; return;
} }
@ -95,10 +122,15 @@ static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorG
objectDetectionResultWithDetectionsPacket:statusOrPackets.value()[kDetectionsStreamName objectDetectionResultWithDetectionsPacket:statusOrPackets.value()[kDetectionsStreamName
.cppString]]; .cppString]];
options.completion(result, NSInteger timeStampInMilliseconds =
outputPacketMap[kImageOutStreamName.cppString].Timestamp().Value() / outputPacketMap[kImageOutStreamName.cppString].Timestamp().Value() /
kMicroSecondsPerMilliSecond, kMicroSecondsPerMilliSecond;
callbackError); dispatch_async(callbackQueue, ^{
[weakSelf.objectDetectorLiveStreamDelegate objectDetector:weakSelf
didFinishDetectionWithResult:result
timestampInMilliseconds:timeStampInMilliseconds
error:callbackError];
});
}; };
} }
@ -112,6 +144,7 @@ static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorG
return nil; return nil;
} }
} }
return self; return self;
} }
@ -224,5 +257,4 @@ static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorG
return [_visionTaskRunner processLiveStreamPacketMap:inputPacketMap.value() error:error]; return [_visionTaskRunner processLiveStreamPacketMap:inputPacketMap.value() error:error];
} }
@end @end

View File

@ -20,19 +20,70 @@
NS_ASSUME_NONNULL_BEGIN NS_ASSUME_NONNULL_BEGIN
@class MPPObjectDetector;
/**
* This protocol defines an interface for the delegates of `MPPObjectDetector` object to receive
* results of performing asynchronous object detection on images (i.e, when `runningMode` =
* `MPPRunningModeLiveStream`).
*
* The delegate of `MPPObjectDetector` must adopt `MPPObjectDetectorLiveStreamDelegate` protocol.
* The methods in this protocol are optional.
*/
NS_SWIFT_NAME(ObjectDetectorLiveStreamDelegate)
@protocol MPPObjectDetectorLiveStreamDelegate <NSObject>
@optional
/**
* This method notifies a delegate that the results of asynchronous object detection of
* an image submitted to the `MPPObjectDetector` is available.
*
* This method is called on a private serial dispatch queue created by the `MPPObjectDetector`
* for performing the asynchronous delegates calls.
*
* @param objectDetector The object detector which performed the object detection.
* This is useful to test equality when there are multiple instances of `MPPObjectDetector`.
* @param result The `MPPObjectDetectionResult` object that contains a list of detections, each
* detection has a bounding box that is expressed in the unrotated input frame of reference
* coordinates system, i.e. in `[0,image_width) x [0,image_height)`, which are the dimensions of the
* underlying image data.
* @param timestampInMilliseconds The timestamp (in milliseconds) which indicates when the input
* image was sent to the object detector.
* @param error An optional error parameter populated when there is an error in performing object
* detection on the input live stream image data.
*
*/
- (void)objectDetector:(MPPObjectDetector *)objectDetector
didFinishDetectionWithResult:(nullable MPPObjectDetectionResult *)result
timestampInMilliseconds:(NSInteger)timestampInMilliseconds
error:(nullable NSError *)error
NS_SWIFT_NAME(objectDetector(_:didFinishDetection:timestampInMilliseconds:error:));
@end
/** Options for setting up a `MPPObjectDetector`. */ /** Options for setting up a `MPPObjectDetector`. */
NS_SWIFT_NAME(ObjectDetectorOptions) NS_SWIFT_NAME(ObjectDetectorOptions)
@interface MPPObjectDetectorOptions : MPPTaskOptions <NSCopying> @interface MPPObjectDetectorOptions : MPPTaskOptions <NSCopying>
/**
* Running mode of the object detector task. Defaults to `MPPRunningModeImage`.
* `MPPImageClassifier` can be created with one of the following running modes:
* 1. `MPPRunningModeImage`: The mode for performing object detection on single image inputs.
* 2. `MPPRunningModeVideo`: The mode for performing object detection on the decoded frames of a
* video.
* 3. `MPPRunningModeLiveStream`: The mode for performing object detection on a live stream of
* input data, such as from the camera.
*/
@property(nonatomic) MPPRunningMode runningMode; @property(nonatomic) MPPRunningMode runningMode;
/** /**
* The user-defined result callback for processing live stream data. The result callback should only * An object that confirms to `MPPObjectDetectorLiveStreamDelegate` protocol. This object must
* be specified when the running mode is set to the live stream mode. * implement `objectDetector:didFinishDetectionWithResult:timestampInMilliseconds:error:` to receive
* TODO: Add parameter `MPPImage` in the callback. * the results of performing asynchronous object detection on images (i.e, when `runningMode` =
* `MPPRunningModeLiveStream`).
*/ */
@property(nonatomic, copy) void (^completion) @property(nonatomic, weak, nullable) id<MPPObjectDetectorLiveStreamDelegate>
(MPPObjectDetectionResult *__nullable result, NSInteger timestampMs, NSError *error); objectDetectorLiveStreamDelegate;
/** /**
* The locale to use for display names specified through the TFLite Model Metadata, if any. Defaults * The locale to use for display names specified through the TFLite Model Metadata, if any. Defaults

View File

@ -33,7 +33,7 @@
objectDetectorOptions.categoryDenylist = self.categoryDenylist; objectDetectorOptions.categoryDenylist = self.categoryDenylist;
objectDetectorOptions.categoryAllowlist = self.categoryAllowlist; objectDetectorOptions.categoryAllowlist = self.categoryAllowlist;
objectDetectorOptions.displayNamesLocale = self.displayNamesLocale; objectDetectorOptions.displayNamesLocale = self.displayNamesLocale;
objectDetectorOptions.completion = self.completion; objectDetectorOptions.objectDetectorLiveStreamDelegate = self.objectDetectorLiveStreamDelegate;
return objectDetectorOptions; return objectDetectorOptions;
} }

View File

@ -39,6 +39,7 @@ import com.google.mediapipe.formats.proto.DetectionProto.Detection;
import java.io.File; import java.io.File;
import java.io.IOException; import java.io.IOException;
import java.nio.ByteBuffer; import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Arrays; import java.util.Arrays;
import java.util.Collections; import java.util.Collections;
import java.util.List; import java.util.List;
@ -170,6 +171,13 @@ public final class ObjectDetector extends BaseVisionTaskApi {
new OutputHandler.OutputPacketConverter<ObjectDetectionResult, MPImage>() { new OutputHandler.OutputPacketConverter<ObjectDetectionResult, MPImage>() {
@Override @Override
public ObjectDetectionResult convertToTaskResult(List<Packet> packets) { public ObjectDetectionResult convertToTaskResult(List<Packet> packets) {
// If there is no object detected in the image, just returns empty lists.
if (packets.get(DETECTIONS_OUT_STREAM_INDEX).isEmpty()) {
return ObjectDetectionResult.create(
new ArrayList<>(),
BaseVisionTaskApi.generateResultTimestampMs(
detectorOptions.runningMode(), packets.get(DETECTIONS_OUT_STREAM_INDEX)));
}
return ObjectDetectionResult.create( return ObjectDetectionResult.create(
PacketGetter.getProtoVector( PacketGetter.getProtoVector(
packets.get(DETECTIONS_OUT_STREAM_INDEX), Detection.parser()), packets.get(DETECTIONS_OUT_STREAM_INDEX), Detection.parser()),

View File

@ -79,8 +79,7 @@ public final class PoseLandmarker extends BaseVisionTaskApi {
private static final int LANDMARKS_OUT_STREAM_INDEX = 0; private static final int LANDMARKS_OUT_STREAM_INDEX = 0;
private static final int WORLD_LANDMARKS_OUT_STREAM_INDEX = 1; private static final int WORLD_LANDMARKS_OUT_STREAM_INDEX = 1;
private static final int AUXILIARY_LANDMARKS_OUT_STREAM_INDEX = 2; private static final int IMAGE_OUT_STREAM_INDEX = 2;
private static final int IMAGE_OUT_STREAM_INDEX = 3;
private static int segmentationMasksOutStreamIndex = -1; private static int segmentationMasksOutStreamIndex = -1;
private static final String TASK_GRAPH_NAME = private static final String TASK_GRAPH_NAME =
"mediapipe.tasks.vision.pose_landmarker.PoseLandmarkerGraph"; "mediapipe.tasks.vision.pose_landmarker.PoseLandmarkerGraph";
@ -145,7 +144,6 @@ public final class PoseLandmarker extends BaseVisionTaskApi {
List<String> outputStreams = new ArrayList<>(); List<String> outputStreams = new ArrayList<>();
outputStreams.add("NORM_LANDMARKS:pose_landmarks"); outputStreams.add("NORM_LANDMARKS:pose_landmarks");
outputStreams.add("WORLD_LANDMARKS:world_landmarks"); outputStreams.add("WORLD_LANDMARKS:world_landmarks");
outputStreams.add("AUXILIARY_LANDMARKS:auxiliary_landmarks");
outputStreams.add("IMAGE:image_out"); outputStreams.add("IMAGE:image_out");
if (landmarkerOptions.outputSegmentationMasks()) { if (landmarkerOptions.outputSegmentationMasks()) {
outputStreams.add("SEGMENTATION_MASK:segmentation_masks"); outputStreams.add("SEGMENTATION_MASK:segmentation_masks");
@ -161,7 +159,6 @@ public final class PoseLandmarker extends BaseVisionTaskApi {
// If there is no poses detected in the image, just returns empty lists. // If there is no poses detected in the image, just returns empty lists.
if (packets.get(LANDMARKS_OUT_STREAM_INDEX).isEmpty()) { if (packets.get(LANDMARKS_OUT_STREAM_INDEX).isEmpty()) {
return PoseLandmarkerResult.create( return PoseLandmarkerResult.create(
new ArrayList<>(),
new ArrayList<>(), new ArrayList<>(),
new ArrayList<>(), new ArrayList<>(),
Optional.empty(), Optional.empty(),
@ -179,9 +176,6 @@ public final class PoseLandmarker extends BaseVisionTaskApi {
packets.get(LANDMARKS_OUT_STREAM_INDEX), NormalizedLandmarkList.parser()), packets.get(LANDMARKS_OUT_STREAM_INDEX), NormalizedLandmarkList.parser()),
PacketGetter.getProtoVector( PacketGetter.getProtoVector(
packets.get(WORLD_LANDMARKS_OUT_STREAM_INDEX), LandmarkList.parser()), packets.get(WORLD_LANDMARKS_OUT_STREAM_INDEX), LandmarkList.parser()),
PacketGetter.getProtoVector(
packets.get(AUXILIARY_LANDMARKS_OUT_STREAM_INDEX),
NormalizedLandmarkList.parser()),
segmentedMasks, segmentedMasks,
BaseVisionTaskApi.generateResultTimestampMs( BaseVisionTaskApi.generateResultTimestampMs(
landmarkerOptions.runningMode(), packets.get(LANDMARKS_OUT_STREAM_INDEX))); landmarkerOptions.runningMode(), packets.get(LANDMARKS_OUT_STREAM_INDEX)));

View File

@ -40,7 +40,6 @@ public abstract class PoseLandmarkerResult implements TaskResult {
static PoseLandmarkerResult create( static PoseLandmarkerResult create(
List<LandmarkProto.NormalizedLandmarkList> landmarksProto, List<LandmarkProto.NormalizedLandmarkList> landmarksProto,
List<LandmarkProto.LandmarkList> worldLandmarksProto, List<LandmarkProto.LandmarkList> worldLandmarksProto,
List<LandmarkProto.NormalizedLandmarkList> auxiliaryLandmarksProto,
Optional<List<MPImage>> segmentationMasksData, Optional<List<MPImage>> segmentationMasksData,
long timestampMs) { long timestampMs) {
@ -52,7 +51,6 @@ public abstract class PoseLandmarkerResult implements TaskResult {
List<List<NormalizedLandmark>> multiPoseLandmarks = new ArrayList<>(); List<List<NormalizedLandmark>> multiPoseLandmarks = new ArrayList<>();
List<List<Landmark>> multiPoseWorldLandmarks = new ArrayList<>(); List<List<Landmark>> multiPoseWorldLandmarks = new ArrayList<>();
List<List<NormalizedLandmark>> multiPoseAuxiliaryLandmarks = new ArrayList<>();
for (LandmarkProto.NormalizedLandmarkList poseLandmarksProto : landmarksProto) { for (LandmarkProto.NormalizedLandmarkList poseLandmarksProto : landmarksProto) {
List<NormalizedLandmark> poseLandmarks = new ArrayList<>(); List<NormalizedLandmark> poseLandmarks = new ArrayList<>();
multiPoseLandmarks.add(poseLandmarks); multiPoseLandmarks.add(poseLandmarks);
@ -75,24 +73,10 @@ public abstract class PoseLandmarkerResult implements TaskResult {
poseWorldLandmarkProto.getZ())); poseWorldLandmarkProto.getZ()));
} }
} }
for (LandmarkProto.NormalizedLandmarkList poseAuxiliaryLandmarksProto :
auxiliaryLandmarksProto) {
List<NormalizedLandmark> poseAuxiliaryLandmarks = new ArrayList<>();
multiPoseAuxiliaryLandmarks.add(poseAuxiliaryLandmarks);
for (LandmarkProto.NormalizedLandmark poseAuxiliaryLandmarkProto :
poseAuxiliaryLandmarksProto.getLandmarkList()) {
poseAuxiliaryLandmarks.add(
NormalizedLandmark.create(
poseAuxiliaryLandmarkProto.getX(),
poseAuxiliaryLandmarkProto.getY(),
poseAuxiliaryLandmarkProto.getZ()));
}
}
return new AutoValue_PoseLandmarkerResult( return new AutoValue_PoseLandmarkerResult(
timestampMs, timestampMs,
Collections.unmodifiableList(multiPoseLandmarks), Collections.unmodifiableList(multiPoseLandmarks),
Collections.unmodifiableList(multiPoseWorldLandmarks), Collections.unmodifiableList(multiPoseWorldLandmarks),
Collections.unmodifiableList(multiPoseAuxiliaryLandmarks),
multiPoseSegmentationMasks); multiPoseSegmentationMasks);
} }
@ -105,9 +89,6 @@ public abstract class PoseLandmarkerResult implements TaskResult {
/** Pose landmarks in world coordniates of detected poses. */ /** Pose landmarks in world coordniates of detected poses. */
public abstract List<List<Landmark>> worldLandmarks(); public abstract List<List<Landmark>> worldLandmarks();
/** Pose auxiliary landmarks. */
public abstract List<List<NormalizedLandmark>> auxiliaryLandmarks();
/** Pose segmentation masks. */ /** Pose segmentation masks. */
public abstract Optional<List<MPImage>> segmentationMasks(); public abstract Optional<List<MPImage>> segmentationMasks();
} }

View File

@ -45,6 +45,7 @@ import org.junit.runners.Suite.SuiteClasses;
@SuiteClasses({ObjectDetectorTest.General.class, ObjectDetectorTest.RunningModeTest.class}) @SuiteClasses({ObjectDetectorTest.General.class, ObjectDetectorTest.RunningModeTest.class})
public class ObjectDetectorTest { public class ObjectDetectorTest {
private static final String MODEL_FILE = "coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.tflite"; private static final String MODEL_FILE = "coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.tflite";
private static final String NO_NMS_MODEL_FILE = "efficientdet_lite0_fp16_no_nms.tflite";
private static final String CAT_AND_DOG_IMAGE = "cats_and_dogs.jpg"; private static final String CAT_AND_DOG_IMAGE = "cats_and_dogs.jpg";
private static final String CAT_AND_DOG_ROTATED_IMAGE = "cats_and_dogs_rotated.jpg"; private static final String CAT_AND_DOG_ROTATED_IMAGE = "cats_and_dogs_rotated.jpg";
private static final int IMAGE_WIDTH = 1200; private static final int IMAGE_WIDTH = 1200;
@ -109,6 +110,20 @@ public class ObjectDetectorTest {
assertContainsOnlyCat(results, CAT_BOUNDING_BOX, CAT_SCORE); assertContainsOnlyCat(results, CAT_BOUNDING_BOX, CAT_SCORE);
} }
@Test
public void detect_succeedsWithNoObjectDetected() throws Exception {
ObjectDetectorOptions options =
ObjectDetectorOptions.builder()
.setBaseOptions(BaseOptions.builder().setModelAssetPath(NO_NMS_MODEL_FILE).build())
.setScoreThreshold(1.0f)
.build();
ObjectDetector objectDetector =
ObjectDetector.createFromOptions(ApplicationProvider.getApplicationContext(), options);
ObjectDetectionResult results = objectDetector.detect(getImageFromAsset(CAT_AND_DOG_IMAGE));
// The score threshold should block objects.
assertThat(results.detections()).isEmpty();
}
@Test @Test
public void detect_succeedsWithAllowListOption() throws Exception { public void detect_succeedsWithAllowListOption() throws Exception {
ObjectDetectorOptions options = ObjectDetectorOptions options =

View File

@ -330,7 +330,6 @@ public class PoseLandmarkerTest {
return PoseLandmarkerResult.create( return PoseLandmarkerResult.create(
Arrays.asList(landmarksDetectionResultProto.getLandmarks()), Arrays.asList(landmarksDetectionResultProto.getLandmarks()),
Arrays.asList(landmarksDetectionResultProto.getWorldLandmarks()), Arrays.asList(landmarksDetectionResultProto.getWorldLandmarks()),
Arrays.asList(),
Optional.empty(), Optional.empty(),
/* timestampMs= */ 0); /* timestampMs= */ 0);
} }

View File

@ -44,6 +44,7 @@ _ObjectDetectorOptions = object_detector.ObjectDetectorOptions
_RUNNING_MODE = running_mode_module.VisionTaskRunningMode _RUNNING_MODE = running_mode_module.VisionTaskRunningMode
_MODEL_FILE = 'coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.tflite' _MODEL_FILE = 'coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.tflite'
_NO_NMS_MODEL_FILE = 'efficientdet_lite0_fp16_no_nms.tflite'
_IMAGE_FILE = 'cats_and_dogs.jpg' _IMAGE_FILE = 'cats_and_dogs.jpg'
_EXPECTED_DETECTION_RESULT = _DetectionResult( _EXPECTED_DETECTION_RESULT = _DetectionResult(
detections=[ detections=[
@ -304,7 +305,7 @@ class ObjectDetectorTest(parameterized.TestCase):
with _ObjectDetector.create_from_options(options) as unused_detector: with _ObjectDetector.create_from_options(options) as unused_detector:
pass pass
def test_empty_detection_outputs(self): def test_empty_detection_outputs_with_in_model_nms(self):
options = _ObjectDetectorOptions( options = _ObjectDetectorOptions(
base_options=_BaseOptions(model_asset_path=self.model_path), base_options=_BaseOptions(model_asset_path=self.model_path),
score_threshold=1, score_threshold=1,
@ -314,6 +315,18 @@ class ObjectDetectorTest(parameterized.TestCase):
detection_result = detector.detect(self.test_image) detection_result = detector.detect(self.test_image)
self.assertEmpty(detection_result.detections) self.assertEmpty(detection_result.detections)
def test_empty_detection_outputs_without_in_model_nms(self):
options = _ObjectDetectorOptions(
base_options=_BaseOptions(
model_asset_path=test_utils.get_test_data_path(
os.path.join(_TEST_DATA_DIR, _NO_NMS_MODEL_FILE))),
score_threshold=1,
)
with _ObjectDetector.create_from_options(options) as detector:
# Performs object detection on the input.
detection_result = detector.detect(self.test_image)
self.assertEmpty(detection_result.detections)
def test_missing_result_callback(self): def test_missing_result_callback(self):
options = _ObjectDetectorOptions( options = _ObjectDetectorOptions(
base_options=_BaseOptions(model_asset_path=self.model_path), base_options=_BaseOptions(model_asset_path=self.model_path),

View File

@ -74,7 +74,6 @@ def _get_expected_pose_landmarker_result(
return PoseLandmarkerResult( return PoseLandmarkerResult(
pose_landmarks=[landmarks_detection_result.landmarks], pose_landmarks=[landmarks_detection_result.landmarks],
pose_world_landmarks=[], pose_world_landmarks=[],
pose_auxiliary_landmarks=[],
) )
@ -296,7 +295,6 @@ class PoseLandmarkerTest(parameterized.TestCase):
# Comparing results. # Comparing results.
self.assertEmpty(detection_result.pose_landmarks) self.assertEmpty(detection_result.pose_landmarks)
self.assertEmpty(detection_result.pose_world_landmarks) self.assertEmpty(detection_result.pose_world_landmarks)
self.assertEmpty(detection_result.pose_auxiliary_landmarks)
def test_missing_result_callback(self): def test_missing_result_callback(self):
options = _PoseLandmarkerOptions( options = _PoseLandmarkerOptions(
@ -391,7 +389,7 @@ class PoseLandmarkerTest(parameterized.TestCase):
True, True,
_get_expected_pose_landmarker_result(_POSE_LANDMARKS), _get_expected_pose_landmarker_result(_POSE_LANDMARKS),
), ),
(_BURGER_IMAGE, 0, False, PoseLandmarkerResult([], [], [])), (_BURGER_IMAGE, 0, False, PoseLandmarkerResult([], [])),
) )
def test_detect_for_video( def test_detect_for_video(
self, image_path, rotation, output_segmentation_masks, expected_result self, image_path, rotation, output_segmentation_masks, expected_result
@ -473,7 +471,7 @@ class PoseLandmarkerTest(parameterized.TestCase):
True, True,
_get_expected_pose_landmarker_result(_POSE_LANDMARKS), _get_expected_pose_landmarker_result(_POSE_LANDMARKS),
), ),
(_BURGER_IMAGE, 0, False, PoseLandmarkerResult([], [], [])), (_BURGER_IMAGE, 0, False, PoseLandmarkerResult([], [])),
) )
def test_detect_async_calls( def test_detect_async_calls(
self, image_path, rotation, output_segmentation_masks, expected_result self, image_path, rotation, output_segmentation_masks, expected_result

View File

@ -198,6 +198,15 @@ class ObjectDetector(base_vision_task_api.BaseVisionTaskApi):
def packets_callback(output_packets: Mapping[str, packet_module.Packet]): def packets_callback(output_packets: Mapping[str, packet_module.Packet]):
if output_packets[_IMAGE_OUT_STREAM_NAME].is_empty(): if output_packets[_IMAGE_OUT_STREAM_NAME].is_empty():
return return
image = packet_getter.get_image(output_packets[_IMAGE_OUT_STREAM_NAME])
if output_packets[_DETECTIONS_OUT_STREAM_NAME].is_empty():
empty_packet = output_packets[_DETECTIONS_OUT_STREAM_NAME]
options.result_callback(
ObjectDetectorResult([]),
image,
empty_packet.timestamp.value // _MICRO_SECONDS_PER_MILLISECOND,
)
return
detection_proto_list = packet_getter.get_proto_list( detection_proto_list = packet_getter.get_proto_list(
output_packets[_DETECTIONS_OUT_STREAM_NAME] output_packets[_DETECTIONS_OUT_STREAM_NAME]
) )
@ -207,7 +216,6 @@ class ObjectDetector(base_vision_task_api.BaseVisionTaskApi):
for result in detection_proto_list for result in detection_proto_list
] ]
) )
image = packet_getter.get_image(output_packets[_IMAGE_OUT_STREAM_NAME])
timestamp = output_packets[_IMAGE_OUT_STREAM_NAME].timestamp timestamp = output_packets[_IMAGE_OUT_STREAM_NAME].timestamp
options.result_callback(detection_result, image, timestamp) options.result_callback(detection_result, image, timestamp)
@ -266,6 +274,8 @@ class ObjectDetector(base_vision_task_api.BaseVisionTaskApi):
normalized_rect.to_pb2() normalized_rect.to_pb2()
), ),
}) })
if output_packets[_DETECTIONS_OUT_STREAM_NAME].is_empty():
return ObjectDetectorResult([])
detection_proto_list = packet_getter.get_proto_list( detection_proto_list = packet_getter.get_proto_list(
output_packets[_DETECTIONS_OUT_STREAM_NAME] output_packets[_DETECTIONS_OUT_STREAM_NAME]
) )
@ -315,6 +325,8 @@ class ObjectDetector(base_vision_task_api.BaseVisionTaskApi):
normalized_rect.to_pb2() normalized_rect.to_pb2()
).at(timestamp_ms * _MICRO_SECONDS_PER_MILLISECOND), ).at(timestamp_ms * _MICRO_SECONDS_PER_MILLISECOND),
}) })
if output_packets[_DETECTIONS_OUT_STREAM_NAME].is_empty():
return ObjectDetectorResult([])
detection_proto_list = packet_getter.get_proto_list( detection_proto_list = packet_getter.get_proto_list(
output_packets[_DETECTIONS_OUT_STREAM_NAME] output_packets[_DETECTIONS_OUT_STREAM_NAME]
) )

View File

@ -49,8 +49,6 @@ _NORM_LANDMARKS_STREAM_NAME = 'norm_landmarks'
_NORM_LANDMARKS_TAG = 'NORM_LANDMARKS' _NORM_LANDMARKS_TAG = 'NORM_LANDMARKS'
_POSE_WORLD_LANDMARKS_STREAM_NAME = 'world_landmarks' _POSE_WORLD_LANDMARKS_STREAM_NAME = 'world_landmarks'
_POSE_WORLD_LANDMARKS_TAG = 'WORLD_LANDMARKS' _POSE_WORLD_LANDMARKS_TAG = 'WORLD_LANDMARKS'
_POSE_AUXILIARY_LANDMARKS_STREAM_NAME = 'auxiliary_landmarks'
_POSE_AUXILIARY_LANDMARKS_TAG = 'AUXILIARY_LANDMARKS'
_TASK_GRAPH_NAME = 'mediapipe.tasks.vision.pose_landmarker.PoseLandmarkerGraph' _TASK_GRAPH_NAME = 'mediapipe.tasks.vision.pose_landmarker.PoseLandmarkerGraph'
_MICRO_SECONDS_PER_MILLISECOND = 1000 _MICRO_SECONDS_PER_MILLISECOND = 1000
@ -62,14 +60,11 @@ class PoseLandmarkerResult:
Attributes: Attributes:
pose_landmarks: Detected pose landmarks in normalized image coordinates. pose_landmarks: Detected pose landmarks in normalized image coordinates.
pose_world_landmarks: Detected pose landmarks in world coordinates. pose_world_landmarks: Detected pose landmarks in world coordinates.
pose_auxiliary_landmarks: Detected auxiliary landmarks, used for deriving
ROI for next frame.
segmentation_masks: Optional segmentation masks for pose. segmentation_masks: Optional segmentation masks for pose.
""" """
pose_landmarks: List[List[landmark_module.NormalizedLandmark]] pose_landmarks: List[List[landmark_module.NormalizedLandmark]]
pose_world_landmarks: List[List[landmark_module.Landmark]] pose_world_landmarks: List[List[landmark_module.Landmark]]
pose_auxiliary_landmarks: List[List[landmark_module.NormalizedLandmark]]
segmentation_masks: Optional[List[image_module.Image]] = None segmentation_masks: Optional[List[image_module.Image]] = None
@ -77,7 +72,7 @@ def _build_landmarker_result(
output_packets: Mapping[str, packet_module.Packet] output_packets: Mapping[str, packet_module.Packet]
) -> PoseLandmarkerResult: ) -> PoseLandmarkerResult:
"""Constructs a `PoseLandmarkerResult` from output packets.""" """Constructs a `PoseLandmarkerResult` from output packets."""
pose_landmarker_result = PoseLandmarkerResult([], [], []) pose_landmarker_result = PoseLandmarkerResult([], [])
if _SEGMENTATION_MASK_STREAM_NAME in output_packets: if _SEGMENTATION_MASK_STREAM_NAME in output_packets:
pose_landmarker_result.segmentation_masks = packet_getter.get_image_list( pose_landmarker_result.segmentation_masks = packet_getter.get_image_list(
@ -90,9 +85,6 @@ def _build_landmarker_result(
pose_world_landmarks_proto_list = packet_getter.get_proto_list( pose_world_landmarks_proto_list = packet_getter.get_proto_list(
output_packets[_POSE_WORLD_LANDMARKS_STREAM_NAME] output_packets[_POSE_WORLD_LANDMARKS_STREAM_NAME]
) )
pose_auxiliary_landmarks_proto_list = packet_getter.get_proto_list(
output_packets[_POSE_AUXILIARY_LANDMARKS_STREAM_NAME]
)
for proto in pose_landmarks_proto_list: for proto in pose_landmarks_proto_list:
pose_landmarks = landmark_pb2.NormalizedLandmarkList() pose_landmarks = landmark_pb2.NormalizedLandmarkList()
@ -116,19 +108,6 @@ def _build_landmarker_result(
pose_world_landmarks_list pose_world_landmarks_list
) )
for proto in pose_auxiliary_landmarks_proto_list:
pose_auxiliary_landmarks = landmark_pb2.NormalizedLandmarkList()
pose_auxiliary_landmarks.MergeFrom(proto)
pose_auxiliary_landmarks_list = []
for pose_auxiliary_landmark in pose_auxiliary_landmarks.landmark:
pose_auxiliary_landmarks_list.append(
landmark_module.NormalizedLandmark.create_from_pb2(
pose_auxiliary_landmark
)
)
pose_landmarker_result.pose_auxiliary_landmarks.append(
pose_auxiliary_landmarks_list
)
return pose_landmarker_result return pose_landmarker_result
@ -301,7 +280,7 @@ class PoseLandmarker(base_vision_task_api.BaseVisionTaskApi):
if output_packets[_NORM_LANDMARKS_STREAM_NAME].is_empty(): if output_packets[_NORM_LANDMARKS_STREAM_NAME].is_empty():
empty_packet = output_packets[_NORM_LANDMARKS_STREAM_NAME] empty_packet = output_packets[_NORM_LANDMARKS_STREAM_NAME]
options.result_callback( options.result_callback(
PoseLandmarkerResult([], [], []), PoseLandmarkerResult([], []),
image, image,
empty_packet.timestamp.value // _MICRO_SECONDS_PER_MILLISECOND, empty_packet.timestamp.value // _MICRO_SECONDS_PER_MILLISECOND,
) )
@ -320,10 +299,6 @@ class PoseLandmarker(base_vision_task_api.BaseVisionTaskApi):
':'.join( ':'.join(
[_POSE_WORLD_LANDMARKS_TAG, _POSE_WORLD_LANDMARKS_STREAM_NAME] [_POSE_WORLD_LANDMARKS_TAG, _POSE_WORLD_LANDMARKS_STREAM_NAME]
), ),
':'.join([
_POSE_AUXILIARY_LANDMARKS_TAG,
_POSE_AUXILIARY_LANDMARKS_STREAM_NAME,
]),
':'.join([_IMAGE_TAG, _IMAGE_OUT_STREAM_NAME]), ':'.join([_IMAGE_TAG, _IMAGE_OUT_STREAM_NAME]),
] ]
@ -382,7 +357,7 @@ class PoseLandmarker(base_vision_task_api.BaseVisionTaskApi):
}) })
if output_packets[_NORM_LANDMARKS_STREAM_NAME].is_empty(): if output_packets[_NORM_LANDMARKS_STREAM_NAME].is_empty():
return PoseLandmarkerResult([], [], []) return PoseLandmarkerResult([], [])
return _build_landmarker_result(output_packets) return _build_landmarker_result(output_packets)
@ -427,7 +402,7 @@ class PoseLandmarker(base_vision_task_api.BaseVisionTaskApi):
}) })
if output_packets[_NORM_LANDMARKS_STREAM_NAME].is_empty(): if output_packets[_NORM_LANDMARKS_STREAM_NAME].is_empty():
return PoseLandmarkerResult([], [], []) return PoseLandmarkerResult([], [])
return _build_landmarker_result(output_packets) return _build_landmarker_result(output_packets)

View File

@ -21,6 +21,7 @@ VISION_LIBS = [
"//mediapipe/tasks/web/core:fileset_resolver", "//mediapipe/tasks/web/core:fileset_resolver",
"//mediapipe/tasks/web/vision/core:drawing_utils", "//mediapipe/tasks/web/vision/core:drawing_utils",
"//mediapipe/tasks/web/vision/core:image", "//mediapipe/tasks/web/vision/core:image",
"//mediapipe/tasks/web/vision/core:mask",
"//mediapipe/tasks/web/vision/face_detector", "//mediapipe/tasks/web/vision/face_detector",
"//mediapipe/tasks/web/vision/face_landmarker", "//mediapipe/tasks/web/vision/face_landmarker",
"//mediapipe/tasks/web/vision/face_stylizer", "//mediapipe/tasks/web/vision/face_stylizer",

View File

@ -41,7 +41,10 @@ mediapipe_ts_library(
mediapipe_ts_library( mediapipe_ts_library(
name = "image", name = "image",
srcs = ["image.ts"], srcs = [
"image.ts",
"image_shader_context.ts",
],
) )
mediapipe_ts_library( mediapipe_ts_library(
@ -56,12 +59,34 @@ jasmine_node_test(
deps = [":image_test_lib"], deps = [":image_test_lib"],
) )
mediapipe_ts_library(
name = "mask",
srcs = ["mask.ts"],
deps = [":image"],
)
mediapipe_ts_library(
name = "mask_test_lib",
testonly = True,
srcs = ["mask.test.ts"],
deps = [
":image",
":mask",
],
)
jasmine_node_test(
name = "mask_test",
deps = [":mask_test_lib"],
)
mediapipe_ts_library( mediapipe_ts_library(
name = "vision_task_runner", name = "vision_task_runner",
srcs = ["vision_task_runner.ts"], srcs = ["vision_task_runner.ts"],
deps = [ deps = [
":image", ":image",
":image_processing_options", ":image_processing_options",
":mask",
":vision_task_options", ":vision_task_options",
"//mediapipe/framework/formats:rect_jspb_proto", "//mediapipe/framework/formats:rect_jspb_proto",
"//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core",
@ -91,7 +116,6 @@ mediapipe_ts_library(
mediapipe_ts_library( mediapipe_ts_library(
name = "render_utils", name = "render_utils",
srcs = ["render_utils.ts"], srcs = ["render_utils.ts"],
deps = [":image"],
) )
jasmine_node_test( jasmine_node_test(

View File

@ -16,7 +16,8 @@
import 'jasmine'; import 'jasmine';
import {MPImage, MPImageShaderContext, MPImageType} from './image'; import {MPImage} from './image';
import {MPImageShaderContext} from './image_shader_context';
const WIDTH = 2; const WIDTH = 2;
const HEIGHT = 2; const HEIGHT = 2;
@ -40,8 +41,6 @@ const IMAGE_2_3 = [
class MPImageTestContext { class MPImageTestContext {
canvas!: OffscreenCanvas; canvas!: OffscreenCanvas;
gl!: WebGL2RenderingContext; gl!: WebGL2RenderingContext;
uint8ClampedArray!: Uint8ClampedArray;
float32Array!: Float32Array;
imageData!: ImageData; imageData!: ImageData;
imageBitmap!: ImageBitmap; imageBitmap!: ImageBitmap;
webGLTexture!: WebGLTexture; webGLTexture!: WebGLTexture;
@ -55,17 +54,11 @@ class MPImageTestContext {
const gl = this.gl; const gl = this.gl;
this.uint8ClampedArray = new Uint8ClampedArray(pixels.length / 4);
this.float32Array = new Float32Array(pixels.length / 4);
for (let i = 0; i < this.uint8ClampedArray.length; ++i) {
this.uint8ClampedArray[i] = pixels[i * 4];
this.float32Array[i] = pixels[i * 4] / 255;
}
this.imageData = this.imageData =
new ImageData(new Uint8ClampedArray(pixels), width, height); new ImageData(new Uint8ClampedArray(pixels), width, height);
this.imageBitmap = await createImageBitmap(this.imageData); this.imageBitmap = await createImageBitmap(this.imageData);
this.webGLTexture = gl.createTexture()!;
this.webGLTexture = gl.createTexture()!;
gl.bindTexture(gl.TEXTURE_2D, this.webGLTexture); gl.bindTexture(gl.TEXTURE_2D, this.webGLTexture);
gl.texImage2D( gl.texImage2D(
gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, this.imageBitmap); gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, this.imageBitmap);
@ -74,10 +67,6 @@ class MPImageTestContext {
get(type: unknown) { get(type: unknown) {
switch (type) { switch (type) {
case Uint8ClampedArray:
return this.uint8ClampedArray;
case Float32Array:
return this.float32Array;
case ImageData: case ImageData:
return this.imageData; return this.imageData;
case ImageBitmap: case ImageBitmap:
@ -125,25 +114,22 @@ class MPImageTestContext {
gl.bindTexture(gl.TEXTURE_2D, null); gl.bindTexture(gl.TEXTURE_2D, null);
// Sanity check
expect(pixels.find(v => !!v)).toBeDefined();
return pixels; return pixels;
} }
function assertEquality(image: MPImage, expected: ImageType): void { function assertEquality(image: MPImage, expected: ImageType): void {
if (expected instanceof Uint8ClampedArray) { if (expected instanceof ImageData) {
const result = image.get(MPImageType.UINT8_CLAMPED_ARRAY); const result = image.getAsImageData();
expect(result).toEqual(expected);
} else if (expected instanceof Float32Array) {
const result = image.get(MPImageType.FLOAT32_ARRAY);
expect(result).toEqual(expected);
} else if (expected instanceof ImageData) {
const result = image.get(MPImageType.IMAGE_DATA);
expect(result).toEqual(expected); expect(result).toEqual(expected);
} else if (expected instanceof ImageBitmap) { } else if (expected instanceof ImageBitmap) {
const result = image.get(MPImageType.IMAGE_BITMAP); const result = image.getAsImageBitmap();
expect(readPixelsFromImageBitmap(result)) expect(readPixelsFromImageBitmap(result))
.toEqual(readPixelsFromImageBitmap(expected)); .toEqual(readPixelsFromImageBitmap(expected));
} else { // WebGLTexture } else { // WebGLTexture
const result = image.get(MPImageType.WEBGL_TEXTURE); const result = image.getAsWebGLTexture();
expect(readPixelsFromWebGLTexture(result)) expect(readPixelsFromWebGLTexture(result))
.toEqual(readPixelsFromWebGLTexture(expected)); .toEqual(readPixelsFromWebGLTexture(expected));
} }
@ -177,9 +163,7 @@ class MPImageTestContext {
shaderContext.close(); shaderContext.close();
} }
const sources = skip ? const sources = skip ? [] : [ImageData, ImageBitmap, WebGLTexture];
[] :
[Uint8ClampedArray, Float32Array, ImageData, ImageBitmap, WebGLTexture];
for (let i = 0; i < sources.length; i++) { for (let i = 0; i < sources.length; i++) {
for (let j = 0; j < sources.length; j++) { for (let j = 0; j < sources.length; j++) {
@ -202,11 +186,11 @@ class MPImageTestContext {
const shaderContext = new MPImageShaderContext(); const shaderContext = new MPImageShaderContext();
const image = new MPImage( const image = new MPImage(
[context.webGLTexture], [context.webGLTexture], /* ownsImageBitmap= */ false,
/* ownsImageBitmap= */ false, /* ownsWebGLTexture= */ false, /* ownsWebGLTexture= */ false, context.canvas, shaderContext, WIDTH,
context.canvas, shaderContext, WIDTH, HEIGHT); HEIGHT);
const result = image.clone().get(MPImageType.IMAGE_DATA); const result = image.clone().getAsImageData();
expect(result).toEqual(context.imageData); expect(result).toEqual(context.imageData);
shaderContext.close(); shaderContext.close();
@ -217,19 +201,19 @@ class MPImageTestContext {
const shaderContext = new MPImageShaderContext(); const shaderContext = new MPImageShaderContext();
const image = new MPImage( const image = new MPImage(
[context.webGLTexture], [context.webGLTexture], /* ownsImageBitmap= */ false,
/* ownsImageBitmap= */ false, /* ownsWebGLTexture= */ false, /* ownsWebGLTexture= */ false, context.canvas, shaderContext, WIDTH,
context.canvas, shaderContext, WIDTH, HEIGHT); HEIGHT);
// Verify that we can mix the different shader modes by running them out of // Verify that we can mix the different shader modes by running them out of
// order. // order.
let result = image.get(MPImageType.IMAGE_DATA); let result = image.getAsImageData();
expect(result).toEqual(context.imageData); expect(result).toEqual(context.imageData);
result = image.clone().get(MPImageType.IMAGE_DATA); result = image.clone().getAsImageData();
expect(result).toEqual(context.imageData); expect(result).toEqual(context.imageData);
result = image.get(MPImageType.IMAGE_DATA); result = image.getAsImageData();
expect(result).toEqual(context.imageData); expect(result).toEqual(context.imageData);
shaderContext.close(); shaderContext.close();
@ -241,43 +225,21 @@ class MPImageTestContext {
const shaderContext = new MPImageShaderContext(); const shaderContext = new MPImageShaderContext();
const image = createImage(shaderContext, context.imageData, WIDTH, HEIGHT); const image = createImage(shaderContext, context.imageData, WIDTH, HEIGHT);
expect(image.has(MPImageType.IMAGE_DATA)).toBe(true); expect(image.hasImageData()).toBe(true);
expect(image.has(MPImageType.UINT8_CLAMPED_ARRAY)).toBe(false); expect(image.hasWebGLTexture()).toBe(false);
expect(image.has(MPImageType.FLOAT32_ARRAY)).toBe(false); expect(image.hasImageBitmap()).toBe(false);
expect(image.has(MPImageType.WEBGL_TEXTURE)).toBe(false);
expect(image.has(MPImageType.IMAGE_BITMAP)).toBe(false);
image.get(MPImageType.UINT8_CLAMPED_ARRAY); image.getAsWebGLTexture();
expect(image.has(MPImageType.IMAGE_DATA)).toBe(true); expect(image.hasImageData()).toBe(true);
expect(image.has(MPImageType.UINT8_CLAMPED_ARRAY)).toBe(true); expect(image.hasWebGLTexture()).toBe(true);
expect(image.has(MPImageType.FLOAT32_ARRAY)).toBe(false); expect(image.hasImageBitmap()).toBe(false);
expect(image.has(MPImageType.WEBGL_TEXTURE)).toBe(false);
expect(image.has(MPImageType.IMAGE_BITMAP)).toBe(false);
image.get(MPImageType.FLOAT32_ARRAY); image.getAsImageBitmap();
expect(image.has(MPImageType.IMAGE_DATA)).toBe(true); expect(image.hasImageData()).toBe(true);
expect(image.has(MPImageType.UINT8_CLAMPED_ARRAY)).toBe(true); expect(image.hasWebGLTexture()).toBe(true);
expect(image.has(MPImageType.FLOAT32_ARRAY)).toBe(true); expect(image.hasImageBitmap()).toBe(true);
expect(image.has(MPImageType.WEBGL_TEXTURE)).toBe(false);
expect(image.has(MPImageType.IMAGE_BITMAP)).toBe(false);
image.get(MPImageType.WEBGL_TEXTURE);
expect(image.has(MPImageType.IMAGE_DATA)).toBe(true);
expect(image.has(MPImageType.UINT8_CLAMPED_ARRAY)).toBe(true);
expect(image.has(MPImageType.FLOAT32_ARRAY)).toBe(true);
expect(image.has(MPImageType.WEBGL_TEXTURE)).toBe(true);
expect(image.has(MPImageType.IMAGE_BITMAP)).toBe(false);
image.get(MPImageType.IMAGE_BITMAP);
expect(image.has(MPImageType.IMAGE_DATA)).toBe(true);
expect(image.has(MPImageType.UINT8_CLAMPED_ARRAY)).toBe(true);
expect(image.has(MPImageType.FLOAT32_ARRAY)).toBe(true);
expect(image.has(MPImageType.WEBGL_TEXTURE)).toBe(true);
expect(image.has(MPImageType.IMAGE_BITMAP)).toBe(true);
image.close(); image.close();
shaderContext.close(); shaderContext.close();

View File

@ -14,14 +14,10 @@
* limitations under the License. * limitations under the License.
*/ */
import {assertNotNull, MPImageShaderContext} from '../../../../tasks/web/vision/core/image_shader_context';
/** The underlying type of the image. */ /** The underlying type of the image. */
export enum MPImageType { enum MPImageType {
/** Represents the native `UInt8ClampedArray` type. */
UINT8_CLAMPED_ARRAY,
/**
* Represents the native `Float32Array` type. Values range from [0.0, 1.0].
*/
FLOAT32_ARRAY,
/** Represents the native `ImageData` type. */ /** Represents the native `ImageData` type. */
IMAGE_DATA, IMAGE_DATA,
/** Represents the native `ImageBitmap` type. */ /** Represents the native `ImageBitmap` type. */
@ -31,377 +27,16 @@ export enum MPImageType {
} }
/** The supported image formats. For internal usage. */ /** The supported image formats. For internal usage. */
export type MPImageContainer = export type MPImageContainer = ImageData|ImageBitmap|WebGLTexture;
Uint8ClampedArray|Float32Array|ImageData|ImageBitmap|WebGLTexture;
const VERTEX_SHADER = `
attribute vec2 aVertex;
attribute vec2 aTex;
varying vec2 vTex;
void main(void) {
gl_Position = vec4(aVertex, 0.0, 1.0);
vTex = aTex;
}`;
const FRAGMENT_SHADER = `
precision mediump float;
varying vec2 vTex;
uniform sampler2D inputTexture;
void main() {
gl_FragColor = texture2D(inputTexture, vTex);
}
`;
function assertNotNull<T>(value: T|null, msg: string): T {
if (value === null) {
throw new Error(`Unable to obtain required WebGL resource: ${msg}`);
}
return value;
}
// TODO: Move internal-only types to different module.
/**
* Utility class that encapsulates the buffers used by `MPImageShaderContext`.
* For internal use only.
*/
class MPImageShaderBuffers {
constructor(
private readonly gl: WebGL2RenderingContext,
private readonly vertexArrayObject: WebGLVertexArrayObject,
private readonly vertexBuffer: WebGLBuffer,
private readonly textureBuffer: WebGLBuffer) {}
bind() {
this.gl.bindVertexArray(this.vertexArrayObject);
}
unbind() {
this.gl.bindVertexArray(null);
}
close() {
this.gl.deleteVertexArray(this.vertexArrayObject);
this.gl.deleteBuffer(this.vertexBuffer);
this.gl.deleteBuffer(this.textureBuffer);
}
}
/**
* A class that encapsulates the shaders used by an MPImage. Can be re-used
* across MPImages that use the same WebGL2Rendering context.
*
* For internal use only.
*/
export class MPImageShaderContext {
private gl?: WebGL2RenderingContext;
private framebuffer?: WebGLFramebuffer;
private program?: WebGLProgram;
private vertexShader?: WebGLShader;
private fragmentShader?: WebGLShader;
private aVertex?: GLint;
private aTex?: GLint;
/**
* The shader buffers used for passthrough renders that don't modify the
* input texture.
*/
private shaderBuffersPassthrough?: MPImageShaderBuffers;
/**
* The shader buffers used for passthrough renders that flip the input texture
* vertically before conversion to a different type. This is used to flip the
* texture to the expected orientation for drawing in the browser.
*/
private shaderBuffersFlipVertically?: MPImageShaderBuffers;
private compileShader(source: string, type: number): WebGLShader {
const gl = this.gl!;
const shader =
assertNotNull(gl.createShader(type), 'Failed to create WebGL shader');
gl.shaderSource(shader, source);
gl.compileShader(shader);
if (!gl.getShaderParameter(shader, gl.COMPILE_STATUS)) {
const info = gl.getShaderInfoLog(shader);
throw new Error(`Could not compile WebGL shader: ${info}`);
}
gl.attachShader(this.program!, shader);
return shader;
}
private setupShaders(): void {
const gl = this.gl!;
this.program =
assertNotNull(gl.createProgram()!, 'Failed to create WebGL program');
this.vertexShader = this.compileShader(VERTEX_SHADER, gl.VERTEX_SHADER);
this.fragmentShader =
this.compileShader(FRAGMENT_SHADER, gl.FRAGMENT_SHADER);
gl.linkProgram(this.program);
const linked = gl.getProgramParameter(this.program, gl.LINK_STATUS);
if (!linked) {
const info = gl.getProgramInfoLog(this.program);
throw new Error(`Error during program linking: ${info}`);
}
this.aVertex = gl.getAttribLocation(this.program, 'aVertex');
this.aTex = gl.getAttribLocation(this.program, 'aTex');
}
private createBuffers(flipVertically: boolean): MPImageShaderBuffers {
const gl = this.gl!;
const vertexArrayObject =
assertNotNull(gl.createVertexArray(), 'Failed to create vertex array');
gl.bindVertexArray(vertexArrayObject);
const vertexBuffer =
assertNotNull(gl.createBuffer(), 'Failed to create buffer');
gl.bindBuffer(gl.ARRAY_BUFFER, vertexBuffer);
gl.enableVertexAttribArray(this.aVertex!);
gl.vertexAttribPointer(this.aVertex!, 2, gl.FLOAT, false, 0, 0);
gl.bufferData(
gl.ARRAY_BUFFER, new Float32Array([-1, -1, -1, 1, 1, 1, 1, -1]),
gl.STATIC_DRAW);
const textureBuffer =
assertNotNull(gl.createBuffer(), 'Failed to create buffer');
gl.bindBuffer(gl.ARRAY_BUFFER, textureBuffer);
gl.enableVertexAttribArray(this.aTex!);
gl.vertexAttribPointer(this.aTex!, 2, gl.FLOAT, false, 0, 0);
const bufferData =
flipVertically ? [0, 1, 0, 0, 1, 0, 1, 1] : [0, 0, 0, 1, 1, 1, 1, 0];
gl.bufferData(
gl.ARRAY_BUFFER, new Float32Array(bufferData), gl.STATIC_DRAW);
gl.bindBuffer(gl.ARRAY_BUFFER, null);
gl.bindVertexArray(null);
return new MPImageShaderBuffers(
gl, vertexArrayObject, vertexBuffer, textureBuffer);
}
private getShaderBuffers(flipVertically: boolean): MPImageShaderBuffers {
if (flipVertically) {
if (!this.shaderBuffersFlipVertically) {
this.shaderBuffersFlipVertically =
this.createBuffers(/* flipVertically= */ true);
}
return this.shaderBuffersFlipVertically;
} else {
if (!this.shaderBuffersPassthrough) {
this.shaderBuffersPassthrough =
this.createBuffers(/* flipVertically= */ false);
}
return this.shaderBuffersPassthrough;
}
}
private maybeInitGL(gl: WebGL2RenderingContext): void {
if (!this.gl) {
this.gl = gl;
} else if (gl !== this.gl) {
throw new Error('Cannot change GL context once initialized');
}
}
/** Runs the callback using the shader. */
run<T>(
gl: WebGL2RenderingContext, flipVertically: boolean,
callback: () => T): T {
this.maybeInitGL(gl);
if (!this.program) {
this.setupShaders();
}
const shaderBuffers = this.getShaderBuffers(flipVertically);
gl.useProgram(this.program!);
shaderBuffers.bind();
const result = callback();
shaderBuffers.unbind();
return result;
}
/**
* Binds a framebuffer to the canvas. If the framebuffer does not yet exist,
* creates it first. Binds the provided texture to the framebuffer.
*/
bindFramebuffer(gl: WebGL2RenderingContext, texture: WebGLTexture): void {
this.maybeInitGL(gl);
if (!this.framebuffer) {
this.framebuffer =
assertNotNull(gl.createFramebuffer(), 'Failed to create framebuffe.');
}
gl.bindFramebuffer(gl.FRAMEBUFFER, this.framebuffer);
gl.framebufferTexture2D(
gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, texture, 0);
}
unbindFramebuffer(): void {
this.gl?.bindFramebuffer(this.gl.FRAMEBUFFER, null);
}
close() {
if (this.program) {
const gl = this.gl!;
gl.deleteProgram(this.program);
gl.deleteShader(this.vertexShader!);
gl.deleteShader(this.fragmentShader!);
}
if (this.framebuffer) {
this.gl!.deleteFramebuffer(this.framebuffer);
}
if (this.shaderBuffersPassthrough) {
this.shaderBuffersPassthrough.close();
}
if (this.shaderBuffersFlipVertically) {
this.shaderBuffersFlipVertically.close();
}
}
}
/** A four channel color with a red, green, blue and alpha values. */
export type RGBAColor = [number, number, number, number];
/**
* An interface that can be used to provide custom conversion functions. These
* functions are invoked to convert pixel values between different channel
* counts and value ranges. Any conversion function that is not specified will
* result in a default conversion.
*/
export interface MPImageChannelConverter {
/**
* A conversion function to convert a number in the [0.0, 1.0] range to RGBA.
* The output is an array with four elemeents whose values range from 0 to 255
* inclusive.
*
* The default conversion function is `[v * 255, v * 255, v * 255, 255]`
* and will log a warning if invoked.
*/
floatToRGBAConverter?: (value: number) => RGBAColor;
/*
* A conversion function to convert a number in the [0, 255] range to RGBA.
* The output is an array with four elemeents whose values range from 0 to 255
* inclusive.
*
* The default conversion function is `[v, v , v , 255]` and will log a
* warning if invoked.
*/
uint8ToRGBAConverter?: (value: number) => RGBAColor;
/**
* A conversion function to convert an RGBA value in the range of 0 to 255 to
* a single value in the [0.0, 1.0] range.
*
* The default conversion function is `(r / 3 + g / 3 + b / 3) / 255` and will
* log a warning if invoked.
*/
rgbaToFloatConverter?: (r: number, g: number, b: number, a: number) => number;
/**
* A conversion function to convert an RGBA value in the range of 0 to 255 to
* a single value in the [0, 255] range.
*
* The default conversion function is `r / 3 + g / 3 + b / 3` and will log a
* warning if invoked.
*/
rgbaToUint8Converter?: (r: number, g: number, b: number, a: number) => number;
/**
* A conversion function to convert a single value in the 0.0 to 1.0 range to
* [0, 255].
*
* The default conversion function is `r * 255` and will log a warning if
* invoked.
*/
floatToUint8Converter?: (value: number) => number;
/**
* A conversion function to convert a single value in the 0 to 255 range to
* [0.0, 1.0] .
*
* The default conversion function is `r / 255` and will log a warning if
* invoked.
*/
uint8ToFloatConverter?: (value: number) => number;
}
/**
* Color converter that falls back to a default implementation if the
* user-provided converter does not specify a conversion.
*/
class DefaultColorConverter implements Required<MPImageChannelConverter> {
private static readonly WARNINGS_LOGGED = new Set<string>();
constructor(private readonly customConverter: MPImageChannelConverter) {}
floatToRGBAConverter(v: number): RGBAColor {
if (this.customConverter.floatToRGBAConverter) {
return this.customConverter.floatToRGBAConverter(v);
}
this.logWarningOnce('floatToRGBAConverter');
return [v * 255, v * 255, v * 255, 255];
}
uint8ToRGBAConverter(v: number): RGBAColor {
if (this.customConverter.uint8ToRGBAConverter) {
return this.customConverter.uint8ToRGBAConverter(v);
}
this.logWarningOnce('uint8ToRGBAConverter');
return [v, v, v, 255];
}
rgbaToFloatConverter(r: number, g: number, b: number, a: number): number {
if (this.customConverter.rgbaToFloatConverter) {
return this.customConverter.rgbaToFloatConverter(r, g, b, a);
}
this.logWarningOnce('rgbaToFloatConverter');
return (r / 3 + g / 3 + b / 3) / 255;
}
rgbaToUint8Converter(r: number, g: number, b: number, a: number): number {
if (this.customConverter.rgbaToUint8Converter) {
return this.customConverter.rgbaToUint8Converter(r, g, b, a);
}
this.logWarningOnce('rgbaToUint8Converter');
return r / 3 + g / 3 + b / 3;
}
floatToUint8Converter(v: number): number {
if (this.customConverter.floatToUint8Converter) {
return this.customConverter.floatToUint8Converter(v);
}
this.logWarningOnce('floatToUint8Converter');
return v * 255;
}
uint8ToFloatConverter(v: number): number {
if (this.customConverter.uint8ToFloatConverter) {
return this.customConverter.uint8ToFloatConverter(v);
}
this.logWarningOnce('uint8ToFloatConverter');
return v / 255;
}
private logWarningOnce(methodName: string): void {
if (!DefaultColorConverter.WARNINGS_LOGGED.has(methodName)) {
console.log(`Using default ${methodName}`);
DefaultColorConverter.WARNINGS_LOGGED.add(methodName);
}
}
}
/** /**
* The wrapper class for MediaPipe Image objects. * The wrapper class for MediaPipe Image objects.
* *
* Images are stored as `ImageData`, `ImageBitmap` or `WebGLTexture` objects. * Images are stored as `ImageData`, `ImageBitmap` or `WebGLTexture` objects.
* You can convert the underlying type to any other type by passing the * You can convert the underlying type to any other type by passing the
* desired type to `get()`. As type conversions can be expensive, it is * desired type to `getAs...()`. As type conversions can be expensive, it is
* recommended to limit these conversions. You can verify what underlying * recommended to limit these conversions. You can verify what underlying
* types are already available by invoking `has()`. * types are already available by invoking `has...()`.
* *
* Images that are returned from a MediaPipe Tasks are owned by by the * Images that are returned from a MediaPipe Tasks are owned by by the
* underlying C++ Task. If you need to extend the lifetime of these objects, * underlying C++ Task. If you need to extend the lifetime of these objects,
@ -413,21 +48,10 @@ class DefaultColorConverter implements Required<MPImageChannelConverter> {
* initialized with an `OffscreenCanvas`. As we require WebGL2 support, this * initialized with an `OffscreenCanvas`. As we require WebGL2 support, this
* places some limitations on Browser support as outlined here: * places some limitations on Browser support as outlined here:
* https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas/getContext * https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas/getContext
*
* Some MediaPipe tasks return single channel masks. These masks are stored
* using an underlying `Uint8ClampedArray` an `Float32Array` (represented as
* single-channel arrays). To convert these type to other formats a conversion
* function is invoked to convert pixel values between single channel and four
* channel RGBA values. To customize this conversion, you can specify these
* conversion functions when you invoke `get()`. If you use the default
* conversion function a warning will be logged to the console.
*/ */
export class MPImage { export class MPImage {
private gl?: WebGL2RenderingContext; private gl?: WebGL2RenderingContext;
/** The underlying type of the image. */
static TYPE = MPImageType;
/** @hideconstructor */ /** @hideconstructor */
constructor( constructor(
private readonly containers: MPImageContainer[], private readonly containers: MPImageContainer[],
@ -442,113 +66,60 @@ export class MPImage {
readonly height: number, readonly height: number,
) {} ) {}
/** /** Returns whether this `MPImage` contains a mask of type `ImageData`. */
* Returns whether this `MPImage` stores the image in the desired format. hasImageData(): boolean {
* This method can be called to reduce expensive conversion before invoking return !!this.getContainer(MPImageType.IMAGE_DATA);
* `get()`. }
*/
has(type: MPImageType): boolean { /** Returns whether this `MPImage` contains a mask of type `ImageBitmap`. */
return !!this.getContainer(type); hasImageBitmap(): boolean {
return !!this.getContainer(MPImageType.IMAGE_BITMAP);
}
/** Returns whether this `MPImage` contains a mask of type `WebGLTexture`. */
hasWebGLTexture(): boolean {
return !!this.getContainer(MPImageType.WEBGL_TEXTURE);
} }
/**
* Returns the underlying image as a single channel `Uint8ClampedArray`. Note
* that this involves an expensive GPU to CPU transfer if the current image is
* only available as an `ImageBitmap` or `WebGLTexture`. If necessary, this
* function converts RGBA data pixel-by-pixel to a single channel value by
* invoking a conversion function (see class comment for detail).
*
* @param type The type of image to return.
* @param converter A set of conversion functions that will be invoked to
* convert the underlying pixel data if necessary. You may omit this
* function if the requested conversion does not change the pixel format.
* @return The current data as a Uint8ClampedArray.
*/
get(type: MPImageType.UINT8_CLAMPED_ARRAY,
converter?: MPImageChannelConverter): Uint8ClampedArray;
/**
* Returns the underlying image as a single channel `Float32Array`. Note
* that this involves an expensive GPU to CPU transfer if the current image is
* only available as an `ImageBitmap` or `WebGLTexture`. If necessary, this
* function converts RGBA data pixel-by-pixel to a single channel value by
* invoking a conversion function (see class comment for detail).
*
* @param type The type of image to return.
* @param converter A set of conversion functions that will be invoked to
* convert the underlying pixel data if necessary. You may omit this
* function if the requested conversion does not change the pixel format.
* @return The current image as a Float32Array.
*/
get(type: MPImageType.FLOAT32_ARRAY,
converter?: MPImageChannelConverter): Float32Array;
/** /**
* Returns the underlying image as an `ImageData` object. Note that this * Returns the underlying image as an `ImageData` object. Note that this
* involves an expensive GPU to CPU transfer if the current image is only * involves an expensive GPU to CPU transfer if the current image is only
* available as an `ImageBitmap` or `WebGLTexture`. If necessary, this * available as an `ImageBitmap` or `WebGLTexture`.
* function converts single channel pixel values to RGBA by invoking a
* conversion function (see class comment for detail).
* *
* @return The current image as an ImageData object. * @return The current image as an ImageData object.
*/ */
get(type: MPImageType.IMAGE_DATA, getAsImageData(): ImageData {
converter?: MPImageChannelConverter): ImageData; return this.convertToImageData();
}
/** /**
* Returns the underlying image as an `ImageBitmap`. Note that * Returns the underlying image as an `ImageBitmap`. Note that
* conversions to `ImageBitmap` are expensive, especially if the data * conversions to `ImageBitmap` are expensive, especially if the data
* currently resides on CPU. If necessary, this function first converts single * currently resides on CPU.
* channel pixel values to RGBA by invoking a conversion function (see class
* comment for detail).
* *
* Processing with `ImageBitmap`s requires that the MediaPipe Task was * Processing with `ImageBitmap`s requires that the MediaPipe Task was
* initialized with an `OffscreenCanvas` with WebGL2 support. See * initialized with an `OffscreenCanvas` with WebGL2 support. See
* https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas/getContext * https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas/getContext
* for a list of supported platforms. * for a list of supported platforms.
* *
* @param type The type of image to return.
* @param converter A set of conversion functions that will be invoked to
* convert the underlying pixel data if necessary. You may omit this
* function if the requested conversion does not change the pixel format.
* @return The current image as an ImageBitmap object. * @return The current image as an ImageBitmap object.
*/ */
get(type: MPImageType.IMAGE_BITMAP, getAsImageBitmap(): ImageBitmap {
converter?: MPImageChannelConverter): ImageBitmap; return this.convertToImageBitmap();
}
/** /**
* Returns the underlying image as a `WebGLTexture` object. Note that this * Returns the underlying image as a `WebGLTexture` object. Note that this
* involves a CPU to GPU transfer if the current image is only available as * involves a CPU to GPU transfer if the current image is only available as
* an `ImageData` object. The returned texture is bound to the current * an `ImageData` object. The returned texture is bound to the current
* canvas (see `.canvas`). * canvas (see `.canvas`).
* *
* @param type The type of image to return.
* @param converter A set of conversion functions that will be invoked to
* convert the underlying pixel data if necessary. You may omit this
* function if the requested conversion does not change the pixel format.
* @return The current image as a WebGLTexture. * @return The current image as a WebGLTexture.
*/ */
get(type: MPImageType.WEBGL_TEXTURE, getAsWebGLTexture(): WebGLTexture {
converter?: MPImageChannelConverter): WebGLTexture; return this.convertToWebGLTexture();
get(type?: MPImageType,
converter?: MPImageChannelConverter): MPImageContainer {
const internalConverter = new DefaultColorConverter(converter ?? {});
switch (type) {
case MPImageType.UINT8_CLAMPED_ARRAY:
return this.convertToUint8ClampedArray(internalConverter);
case MPImageType.FLOAT32_ARRAY:
return this.convertToFloat32Array(internalConverter);
case MPImageType.IMAGE_DATA:
return this.convertToImageData(internalConverter);
case MPImageType.IMAGE_BITMAP:
return this.convertToImageBitmap(internalConverter);
case MPImageType.WEBGL_TEXTURE:
return this.convertToWebGLTexture(internalConverter);
default:
throw new Error(`Type is not supported: ${type}`);
}
} }
private getContainer(type: MPImageType.UINT8_CLAMPED_ARRAY): Uint8ClampedArray
|undefined;
private getContainer(type: MPImageType.FLOAT32_ARRAY): Float32Array|undefined;
private getContainer(type: MPImageType.IMAGE_DATA): ImageData|undefined; private getContainer(type: MPImageType.IMAGE_DATA): ImageData|undefined;
private getContainer(type: MPImageType.IMAGE_BITMAP): ImageBitmap|undefined; private getContainer(type: MPImageType.IMAGE_BITMAP): ImageBitmap|undefined;
private getContainer(type: MPImageType.WEBGL_TEXTURE): WebGLTexture|undefined; private getContainer(type: MPImageType.WEBGL_TEXTURE): WebGLTexture|undefined;
@ -556,16 +127,16 @@ export class MPImage {
/** Returns the container for the requested storage type iff it exists. */ /** Returns the container for the requested storage type iff it exists. */
private getContainer(type: MPImageType): MPImageContainer|undefined { private getContainer(type: MPImageType): MPImageContainer|undefined {
switch (type) { switch (type) {
case MPImageType.UINT8_CLAMPED_ARRAY:
return this.containers.find(img => img instanceof Uint8ClampedArray);
case MPImageType.FLOAT32_ARRAY:
return this.containers.find(img => img instanceof Float32Array);
case MPImageType.IMAGE_DATA: case MPImageType.IMAGE_DATA:
return this.containers.find(img => img instanceof ImageData); return this.containers.find(img => img instanceof ImageData);
case MPImageType.IMAGE_BITMAP: case MPImageType.IMAGE_BITMAP:
return this.containers.find(img => img instanceof ImageBitmap); return this.containers.find(
img => typeof ImageBitmap !== 'undefined' &&
img instanceof ImageBitmap);
case MPImageType.WEBGL_TEXTURE: case MPImageType.WEBGL_TEXTURE:
return this.containers.find(img => img instanceof WebGLTexture); return this.containers.find(
img => typeof WebGLTexture !== 'undefined' &&
img instanceof WebGLTexture);
default: default:
throw new Error(`Type is not supported: ${type}`); throw new Error(`Type is not supported: ${type}`);
} }
@ -586,11 +157,7 @@ export class MPImage {
for (const container of this.containers) { for (const container of this.containers) {
let destinationContainer: MPImageContainer; let destinationContainer: MPImageContainer;
if (container instanceof Uint8ClampedArray) { if (container instanceof ImageData) {
destinationContainer = new Uint8ClampedArray(container);
} else if (container instanceof Float32Array) {
destinationContainer = new Float32Array(container);
} else if (container instanceof ImageData) {
destinationContainer = destinationContainer =
new ImageData(container.data, this.width, this.height); new ImageData(container.data, this.width, this.height);
} else if (container instanceof WebGLTexture) { } else if (container instanceof WebGLTexture) {
@ -619,7 +186,7 @@ export class MPImage {
this.unbindTexture(); this.unbindTexture();
} else if (container instanceof ImageBitmap) { } else if (container instanceof ImageBitmap) {
this.convertToWebGLTexture(new DefaultColorConverter({})); this.convertToWebGLTexture();
this.bindTexture(); this.bindTexture();
destinationContainer = this.copyTextureToBitmap(); destinationContainer = this.copyTextureToBitmap();
this.unbindTexture(); this.unbindTexture();
@ -631,9 +198,8 @@ export class MPImage {
} }
return new MPImage( return new MPImage(
destinationContainers, this.has(MPImageType.IMAGE_BITMAP), destinationContainers, this.hasImageBitmap(), this.hasWebGLTexture(),
this.has(MPImageType.WEBGL_TEXTURE), this.canvas, this.shaderContext, this.canvas, this.shaderContext, this.width, this.height);
this.width, this.height);
} }
private getOffscreenCanvas(): OffscreenCanvas { private getOffscreenCanvas(): OffscreenCanvas {
@ -667,11 +233,10 @@ export class MPImage {
return this.shaderContext; return this.shaderContext;
} }
private convertToImageBitmap(converter: Required<MPImageChannelConverter>): private convertToImageBitmap(): ImageBitmap {
ImageBitmap {
let imageBitmap = this.getContainer(MPImageType.IMAGE_BITMAP); let imageBitmap = this.getContainer(MPImageType.IMAGE_BITMAP);
if (!imageBitmap) { if (!imageBitmap) {
this.convertToWebGLTexture(converter); this.convertToWebGLTexture();
imageBitmap = this.convertWebGLTextureToImageBitmap(); imageBitmap = this.convertWebGLTextureToImageBitmap();
this.containers.push(imageBitmap); this.containers.push(imageBitmap);
this.ownsImageBitmap = true; this.ownsImageBitmap = true;
@ -680,43 +245,15 @@ export class MPImage {
return imageBitmap; return imageBitmap;
} }
private convertToImageData(converter: Required<MPImageChannelConverter>): private convertToImageData(): ImageData {
ImageData {
let imageData = this.getContainer(MPImageType.IMAGE_DATA); let imageData = this.getContainer(MPImageType.IMAGE_DATA);
if (!imageData) { if (!imageData) {
if (this.has(MPImageType.UINT8_CLAMPED_ARRAY)) {
const source = this.getContainer(MPImageType.UINT8_CLAMPED_ARRAY)!;
const destination = new Uint8ClampedArray(this.width * this.height * 4);
for (let i = 0; i < this.width * this.height; i++) {
const rgba = converter.uint8ToRGBAConverter(source[i]);
destination[i * 4] = rgba[0];
destination[i * 4 + 1] = rgba[1];
destination[i * 4 + 2] = rgba[2];
destination[i * 4 + 3] = rgba[3];
}
imageData = new ImageData(destination, this.width, this.height);
this.containers.push(imageData);
} else if (this.has(MPImageType.FLOAT32_ARRAY)) {
const source = this.getContainer(MPImageType.FLOAT32_ARRAY)!;
const destination = new Uint8ClampedArray(this.width * this.height * 4);
for (let i = 0; i < this.width * this.height; i++) {
const rgba = converter.floatToRGBAConverter(source[i]);
destination[i * 4] = rgba[0];
destination[i * 4 + 1] = rgba[1];
destination[i * 4 + 2] = rgba[2];
destination[i * 4 + 3] = rgba[3];
}
imageData = new ImageData(destination, this.width, this.height);
this.containers.push(imageData);
} else if (
this.has(MPImageType.IMAGE_BITMAP) ||
this.has(MPImageType.WEBGL_TEXTURE)) {
const gl = this.getGL(); const gl = this.getGL();
const shaderContext = this.getShaderContext(); const shaderContext = this.getShaderContext();
const pixels = new Uint8Array(this.width * this.height * 4); const pixels = new Uint8Array(this.width * this.height * 4);
// Create texture if needed // Create texture if needed
const webGlTexture = this.convertToWebGLTexture(converter); const webGlTexture = this.convertToWebGLTexture();
// Create a framebuffer from the texture and read back pixels // Create a framebuffer from the texture and read back pixels
shaderContext.bindFramebuffer(gl, webGlTexture); shaderContext.bindFramebuffer(gl, webGlTexture);
@ -727,68 +264,18 @@ export class MPImage {
imageData = new ImageData( imageData = new ImageData(
new Uint8ClampedArray(pixels.buffer), this.width, this.height); new Uint8ClampedArray(pixels.buffer), this.width, this.height);
this.containers.push(imageData); this.containers.push(imageData);
} else {
throw new Error('Couldn\t find backing image for ImageData conversion');
}
} }
return imageData; return imageData;
} }
private convertToUint8ClampedArray( private convertToWebGLTexture(): WebGLTexture {
converter: Required<MPImageChannelConverter>): Uint8ClampedArray {
let uint8ClampedArray = this.getContainer(MPImageType.UINT8_CLAMPED_ARRAY);
if (!uint8ClampedArray) {
if (this.has(MPImageType.FLOAT32_ARRAY)) {
const source = this.getContainer(MPImageType.FLOAT32_ARRAY)!;
uint8ClampedArray = new Uint8ClampedArray(
source.map(v => converter.floatToUint8Converter(v)));
} else {
const source = this.convertToImageData(converter).data;
uint8ClampedArray = new Uint8ClampedArray(this.width * this.height);
for (let i = 0; i < this.width * this.height; i++) {
uint8ClampedArray[i] = converter.rgbaToUint8Converter(
source[i * 4], source[i * 4 + 1], source[i * 4 + 2],
source[i * 4 + 3]);
}
}
this.containers.push(uint8ClampedArray);
}
return uint8ClampedArray;
}
private convertToFloat32Array(converter: Required<MPImageChannelConverter>):
Float32Array {
let float32Array = this.getContainer(MPImageType.FLOAT32_ARRAY);
if (!float32Array) {
if (this.has(MPImageType.UINT8_CLAMPED_ARRAY)) {
const source = this.getContainer(MPImageType.UINT8_CLAMPED_ARRAY)!;
float32Array = new Float32Array(source).map(
v => converter.uint8ToFloatConverter(v));
} else {
const source = this.convertToImageData(converter).data;
float32Array = new Float32Array(this.width * this.height);
for (let i = 0; i < this.width * this.height; i++) {
float32Array[i] = converter.rgbaToFloatConverter(
source[i * 4], source[i * 4 + 1], source[i * 4 + 2],
source[i * 4 + 3]);
}
}
this.containers.push(float32Array);
}
return float32Array;
}
private convertToWebGLTexture(converter: Required<MPImageChannelConverter>):
WebGLTexture {
let webGLTexture = this.getContainer(MPImageType.WEBGL_TEXTURE); let webGLTexture = this.getContainer(MPImageType.WEBGL_TEXTURE);
if (!webGLTexture) { if (!webGLTexture) {
const gl = this.getGL(); const gl = this.getGL();
webGLTexture = this.bindTexture(); webGLTexture = this.bindTexture();
const source = this.getContainer(MPImageType.IMAGE_BITMAP) || const source = this.getContainer(MPImageType.IMAGE_BITMAP) ||
this.convertToImageData(converter); this.convertToImageData();
gl.texImage2D( gl.texImage2D(
gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, source); gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, source);
this.unbindTexture(); this.unbindTexture();

View File

@ -0,0 +1,243 @@
/**
* Copyright 2023 The MediaPipe Authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
const VERTEX_SHADER = `
attribute vec2 aVertex;
attribute vec2 aTex;
varying vec2 vTex;
void main(void) {
gl_Position = vec4(aVertex, 0.0, 1.0);
vTex = aTex;
}`;
const FRAGMENT_SHADER = `
precision mediump float;
varying vec2 vTex;
uniform sampler2D inputTexture;
void main() {
gl_FragColor = texture2D(inputTexture, vTex);
}
`;
/** Helper to assert that `value` is not null. */
export function assertNotNull<T>(value: T|null, msg: string): T {
if (value === null) {
throw new Error(`Unable to obtain required WebGL resource: ${msg}`);
}
return value;
}
/**
* Utility class that encapsulates the buffers used by `MPImageShaderContext`.
* For internal use only.
*/
class MPImageShaderBuffers {
constructor(
private readonly gl: WebGL2RenderingContext,
private readonly vertexArrayObject: WebGLVertexArrayObject,
private readonly vertexBuffer: WebGLBuffer,
private readonly textureBuffer: WebGLBuffer) {}
bind() {
this.gl.bindVertexArray(this.vertexArrayObject);
}
unbind() {
this.gl.bindVertexArray(null);
}
close() {
this.gl.deleteVertexArray(this.vertexArrayObject);
this.gl.deleteBuffer(this.vertexBuffer);
this.gl.deleteBuffer(this.textureBuffer);
}
}
/**
* A class that encapsulates the shaders used by an MPImage. Can be re-used
* across MPImages that use the same WebGL2Rendering context.
*
* For internal use only.
*/
export class MPImageShaderContext {
private gl?: WebGL2RenderingContext;
private framebuffer?: WebGLFramebuffer;
private program?: WebGLProgram;
private vertexShader?: WebGLShader;
private fragmentShader?: WebGLShader;
private aVertex?: GLint;
private aTex?: GLint;
/**
* The shader buffers used for passthrough renders that don't modify the
* input texture.
*/
private shaderBuffersPassthrough?: MPImageShaderBuffers;
/**
* The shader buffers used for passthrough renders that flip the input texture
* vertically before conversion to a different type. This is used to flip the
* texture to the expected orientation for drawing in the browser.
*/
private shaderBuffersFlipVertically?: MPImageShaderBuffers;
private compileShader(source: string, type: number): WebGLShader {
const gl = this.gl!;
const shader =
assertNotNull(gl.createShader(type), 'Failed to create WebGL shader');
gl.shaderSource(shader, source);
gl.compileShader(shader);
if (!gl.getShaderParameter(shader, gl.COMPILE_STATUS)) {
const info = gl.getShaderInfoLog(shader);
throw new Error(`Could not compile WebGL shader: ${info}`);
}
gl.attachShader(this.program!, shader);
return shader;
}
private setupShaders(): void {
const gl = this.gl!;
this.program =
assertNotNull(gl.createProgram()!, 'Failed to create WebGL program');
this.vertexShader = this.compileShader(VERTEX_SHADER, gl.VERTEX_SHADER);
this.fragmentShader =
this.compileShader(FRAGMENT_SHADER, gl.FRAGMENT_SHADER);
gl.linkProgram(this.program);
const linked = gl.getProgramParameter(this.program, gl.LINK_STATUS);
if (!linked) {
const info = gl.getProgramInfoLog(this.program);
throw new Error(`Error during program linking: ${info}`);
}
this.aVertex = gl.getAttribLocation(this.program, 'aVertex');
this.aTex = gl.getAttribLocation(this.program, 'aTex');
}
private createBuffers(flipVertically: boolean): MPImageShaderBuffers {
const gl = this.gl!;
const vertexArrayObject =
assertNotNull(gl.createVertexArray(), 'Failed to create vertex array');
gl.bindVertexArray(vertexArrayObject);
const vertexBuffer =
assertNotNull(gl.createBuffer(), 'Failed to create buffer');
gl.bindBuffer(gl.ARRAY_BUFFER, vertexBuffer);
gl.enableVertexAttribArray(this.aVertex!);
gl.vertexAttribPointer(this.aVertex!, 2, gl.FLOAT, false, 0, 0);
gl.bufferData(
gl.ARRAY_BUFFER, new Float32Array([-1, -1, -1, 1, 1, 1, 1, -1]),
gl.STATIC_DRAW);
const textureBuffer =
assertNotNull(gl.createBuffer(), 'Failed to create buffer');
gl.bindBuffer(gl.ARRAY_BUFFER, textureBuffer);
gl.enableVertexAttribArray(this.aTex!);
gl.vertexAttribPointer(this.aTex!, 2, gl.FLOAT, false, 0, 0);
const bufferData =
flipVertically ? [0, 1, 0, 0, 1, 0, 1, 1] : [0, 0, 0, 1, 1, 1, 1, 0];
gl.bufferData(
gl.ARRAY_BUFFER, new Float32Array(bufferData), gl.STATIC_DRAW);
gl.bindBuffer(gl.ARRAY_BUFFER, null);
gl.bindVertexArray(null);
return new MPImageShaderBuffers(
gl, vertexArrayObject, vertexBuffer, textureBuffer);
}
private getShaderBuffers(flipVertically: boolean): MPImageShaderBuffers {
if (flipVertically) {
if (!this.shaderBuffersFlipVertically) {
this.shaderBuffersFlipVertically =
this.createBuffers(/* flipVertically= */ true);
}
return this.shaderBuffersFlipVertically;
} else {
if (!this.shaderBuffersPassthrough) {
this.shaderBuffersPassthrough =
this.createBuffers(/* flipVertically= */ false);
}
return this.shaderBuffersPassthrough;
}
}
private maybeInitGL(gl: WebGL2RenderingContext): void {
if (!this.gl) {
this.gl = gl;
} else if (gl !== this.gl) {
throw new Error('Cannot change GL context once initialized');
}
}
/** Runs the callback using the shader. */
run<T>(
gl: WebGL2RenderingContext, flipVertically: boolean,
callback: () => T): T {
this.maybeInitGL(gl);
if (!this.program) {
this.setupShaders();
}
const shaderBuffers = this.getShaderBuffers(flipVertically);
gl.useProgram(this.program!);
shaderBuffers.bind();
const result = callback();
shaderBuffers.unbind();
return result;
}
/**
* Binds a framebuffer to the canvas. If the framebuffer does not yet exist,
* creates it first. Binds the provided texture to the framebuffer.
*/
bindFramebuffer(gl: WebGL2RenderingContext, texture: WebGLTexture): void {
this.maybeInitGL(gl);
if (!this.framebuffer) {
this.framebuffer =
assertNotNull(gl.createFramebuffer(), 'Failed to create framebuffe.');
}
gl.bindFramebuffer(gl.FRAMEBUFFER, this.framebuffer);
gl.framebufferTexture2D(
gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, texture, 0);
}
unbindFramebuffer(): void {
this.gl?.bindFramebuffer(this.gl.FRAMEBUFFER, null);
}
close() {
if (this.program) {
const gl = this.gl!;
gl.deleteProgram(this.program);
gl.deleteShader(this.vertexShader!);
gl.deleteShader(this.fragmentShader!);
}
if (this.framebuffer) {
this.gl!.deleteFramebuffer(this.framebuffer);
}
if (this.shaderBuffersPassthrough) {
this.shaderBuffersPassthrough.close();
}
if (this.shaderBuffersFlipVertically) {
this.shaderBuffersFlipVertically.close();
}
}
}

View File

@ -0,0 +1,269 @@
/**
* Copyright 2022 The MediaPipe Authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import 'jasmine';
import {MPImageShaderContext} from './image_shader_context';
import {MPMask} from './mask';
const WIDTH = 2;
const HEIGHT = 2;
const skip = typeof document === 'undefined';
if (skip) {
console.log('These tests must be run in a browser.');
}
/** The mask types supported by MPMask. */
type MaskType = Uint8Array|Float32Array|WebGLTexture;
const MASK_2_1 = [1, 2];
const MASK_2_2 = [1, 2, 3, 4];
const MASK_2_3 = [1, 2, 3, 4, 5, 6];
/** The test images and data to use for the unit tests below. */
class MPMaskTestContext {
canvas!: OffscreenCanvas;
gl!: WebGL2RenderingContext;
uint8Array!: Uint8Array;
float32Array!: Float32Array;
webGLTexture!: WebGLTexture;
async init(pixels = MASK_2_2, width = WIDTH, height = HEIGHT): Promise<void> {
// Initialize a canvas with default dimensions. Note that the canvas size
// can be different from the mask size.
this.canvas = new OffscreenCanvas(WIDTH, HEIGHT);
this.gl = this.canvas.getContext('webgl2') as WebGL2RenderingContext;
const gl = this.gl;
if (!gl.getExtension('EXT_color_buffer_float')) {
throw new Error('Missing required EXT_color_buffer_float extension');
}
this.uint8Array = new Uint8Array(pixels);
this.float32Array = new Float32Array(pixels.length);
for (let i = 0; i < this.uint8Array.length; ++i) {
this.float32Array[i] = pixels[i] / 255;
}
this.webGLTexture = gl.createTexture()!;
gl.bindTexture(gl.TEXTURE_2D, this.webGLTexture);
gl.texImage2D(
gl.TEXTURE_2D, 0, gl.R32F, width, height, 0, gl.RED, gl.FLOAT,
new Float32Array(pixels).map(v => v / 255));
gl.bindTexture(gl.TEXTURE_2D, null);
}
get(type: unknown) {
switch (type) {
case Uint8Array:
return this.uint8Array;
case Float32Array:
return this.float32Array;
case WebGLTexture:
return this.webGLTexture;
default:
throw new Error(`Unsupported type: ${type}`);
}
}
close(): void {
this.gl.deleteTexture(this.webGLTexture);
}
}
(skip ? xdescribe : describe)('MPMask', () => {
const context = new MPMaskTestContext();
afterEach(() => {
context.close();
});
function readPixelsFromWebGLTexture(texture: WebGLTexture): Float32Array {
const pixels = new Float32Array(WIDTH * HEIGHT);
const gl = context.gl;
gl.bindTexture(gl.TEXTURE_2D, texture);
const framebuffer = gl.createFramebuffer()!;
gl.bindFramebuffer(gl.FRAMEBUFFER, framebuffer);
gl.framebufferTexture2D(
gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, texture, 0);
gl.readPixels(0, 0, WIDTH, HEIGHT, gl.RED, gl.FLOAT, pixels);
gl.bindFramebuffer(gl.FRAMEBUFFER, null);
gl.deleteFramebuffer(framebuffer);
gl.bindTexture(gl.TEXTURE_2D, null);
// Sanity check values
expect(pixels[0]).not.toBe(0);
return pixels;
}
function assertEquality(mask: MPMask, expected: MaskType): void {
if (expected instanceof Uint8Array) {
const result = mask.getAsUint8Array();
expect(result).toEqual(expected);
} else if (expected instanceof Float32Array) {
const result = mask.getAsFloat32Array();
expect(result).toEqual(expected);
} else { // WebGLTexture
const result = mask.getAsWebGLTexture();
expect(readPixelsFromWebGLTexture(result))
.toEqual(readPixelsFromWebGLTexture(expected));
}
}
function createImage(
shaderContext: MPImageShaderContext, input: MaskType, width: number,
height: number): MPMask {
return new MPMask(
[input],
/* ownsWebGLTexture= */ false, context.canvas, shaderContext, width,
height);
}
function runConversionTest(
input: MaskType, output: MaskType, width = WIDTH, height = HEIGHT): void {
const shaderContext = new MPImageShaderContext();
const mask = createImage(shaderContext, input, width, height);
assertEquality(mask, output);
mask.close();
shaderContext.close();
}
function runCloneTest(input: MaskType): void {
const shaderContext = new MPImageShaderContext();
const mask = createImage(shaderContext, input, WIDTH, HEIGHT);
const clone = mask.clone();
assertEquality(clone, input);
clone.close();
shaderContext.close();
}
const sources = skip ? [] : [Uint8Array, Float32Array, WebGLTexture];
for (let i = 0; i < sources.length; i++) {
for (let j = 0; j < sources.length; j++) {
it(`converts from ${sources[i].name} to ${sources[j].name}`, async () => {
await context.init();
runConversionTest(context.get(sources[i]), context.get(sources[j]));
});
}
}
for (let i = 0; i < sources.length; i++) {
it(`clones ${sources[i].name}`, async () => {
await context.init();
runCloneTest(context.get(sources[i]));
});
}
it(`does not flip textures twice`, async () => {
await context.init();
const shaderContext = new MPImageShaderContext();
const mask = new MPMask(
[context.webGLTexture],
/* ownsWebGLTexture= */ false, context.canvas, shaderContext, WIDTH,
HEIGHT);
const result = mask.clone().getAsUint8Array();
expect(result).toEqual(context.uint8Array);
shaderContext.close();
});
it(`can clone and get mask`, async () => {
await context.init();
const shaderContext = new MPImageShaderContext();
const mask = new MPMask(
[context.webGLTexture],
/* ownsWebGLTexture= */ false, context.canvas, shaderContext, WIDTH,
HEIGHT);
// Verify that we can mix the different shader modes by running them out of
// order.
let result = mask.getAsUint8Array();
expect(result).toEqual(context.uint8Array);
result = mask.clone().getAsUint8Array();
expect(result).toEqual(context.uint8Array);
result = mask.getAsUint8Array();
expect(result).toEqual(context.uint8Array);
shaderContext.close();
});
it('supports has()', async () => {
await context.init();
const shaderContext = new MPImageShaderContext();
const mask = createImage(shaderContext, context.uint8Array, WIDTH, HEIGHT);
expect(mask.hasUint8Array()).toBe(true);
expect(mask.hasFloat32Array()).toBe(false);
expect(mask.hasWebGLTexture()).toBe(false);
mask.getAsFloat32Array();
expect(mask.hasUint8Array()).toBe(true);
expect(mask.hasFloat32Array()).toBe(true);
expect(mask.hasWebGLTexture()).toBe(false);
mask.getAsWebGLTexture();
expect(mask.hasUint8Array()).toBe(true);
expect(mask.hasFloat32Array()).toBe(true);
expect(mask.hasWebGLTexture()).toBe(true);
mask.close();
shaderContext.close();
});
it('supports mask that is smaller than the canvas', async () => {
await context.init(MASK_2_1, /* width= */ 2, /* height= */ 1);
runConversionTest(
context.uint8Array, context.webGLTexture, /* width= */ 2,
/* height= */ 1);
runConversionTest(
context.webGLTexture, context.float32Array, /* width= */ 2,
/* height= */ 1);
runConversionTest(
context.float32Array, context.uint8Array, /* width= */ 2,
/* height= */ 1);
context.close();
});
it('supports mask that is larger than the canvas', async () => {
await context.init(MASK_2_3, /* width= */ 2, /* height= */ 3);
runConversionTest(
context.uint8Array, context.webGLTexture, /* width= */ 2,
/* height= */ 3);
runConversionTest(
context.webGLTexture, context.float32Array, /* width= */ 2,
/* height= */ 3);
runConversionTest(
context.float32Array, context.uint8Array, /* width= */ 2,
/* height= */ 3);
});
});

View File

@ -0,0 +1,315 @@
/**
* Copyright 2023 The MediaPipe Authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import {assertNotNull, MPImageShaderContext} from '../../../../tasks/web/vision/core/image_shader_context';
/** The underlying type of the image. */
enum MPMaskType {
/** Represents the native `UInt8Array` type. */
UINT8_ARRAY,
/** Represents the native `Float32Array` type. */
FLOAT32_ARRAY,
/** Represents the native `WebGLTexture` type. */
WEBGL_TEXTURE
}
/** The supported mask formats. For internal usage. */
export type MPMaskContainer = Uint8Array|Float32Array|WebGLTexture;
/**
* The wrapper class for MediaPipe segmentation masks.
*
* Masks are stored as `Uint8Array`, `Float32Array` or `WebGLTexture` objects.
* You can convert the underlying type to any other type by passing the desired
* type to `getAs...()`. As type conversions can be expensive, it is recommended
* to limit these conversions. You can verify what underlying types are already
* available by invoking `has...()`.
*
* Masks that are returned from a MediaPipe Tasks are owned by by the
* underlying C++ Task. If you need to extend the lifetime of these objects,
* you can invoke the `clone()` method. To free up the resources obtained
* during any clone or type conversion operation, it is important to invoke
* `close()` on the `MPMask` instance.
*/
export class MPMask {
private gl?: WebGL2RenderingContext;
/** @hideconstructor */
constructor(
private readonly containers: MPMaskContainer[],
private ownsWebGLTexture: boolean,
/** Returns the canvas element that the mask is bound to. */
readonly canvas: HTMLCanvasElement|OffscreenCanvas|undefined,
private shaderContext: MPImageShaderContext|undefined,
/** Returns the width of the mask. */
readonly width: number,
/** Returns the height of the mask. */
readonly height: number,
) {}
/** Returns whether this `MPMask` contains a mask of type `Uint8Array`. */
hasUint8Array(): boolean {
return !!this.getContainer(MPMaskType.UINT8_ARRAY);
}
/** Returns whether this `MPMask` contains a mask of type `Float32Array`. */
hasFloat32Array(): boolean {
return !!this.getContainer(MPMaskType.FLOAT32_ARRAY);
}
/** Returns whether this `MPMask` contains a mask of type `WebGLTexture`. */
hasWebGLTexture(): boolean {
return !!this.getContainer(MPMaskType.WEBGL_TEXTURE);
}
/**
* Returns the underlying mask as a Uint8Array`. Note that this involves an
* expensive GPU to CPU transfer if the current mask is only available as a
* `WebGLTexture`.
*
* @return The current data as a Uint8Array.
*/
getAsUint8Array(): Uint8Array {
return this.convertToUint8Array();
}
/**
* Returns the underlying mask as a single channel `Float32Array`. Note that
* this involves an expensive GPU to CPU transfer if the current mask is only
* available as a `WebGLTexture`.
*
* @return The current mask as a Float32Array.
*/
getAsFloat32Array(): Float32Array {
return this.convertToFloat32Array();
}
/**
* Returns the underlying mask as a `WebGLTexture` object. Note that this
* involves a CPU to GPU transfer if the current mask is only available as
* a CPU array. The returned texture is bound to the current canvas (see
* `.canvas`).
*
* @return The current mask as a WebGLTexture.
*/
getAsWebGLTexture(): WebGLTexture {
return this.convertToWebGLTexture();
}
private getContainer(type: MPMaskType.UINT8_ARRAY): Uint8Array|undefined;
private getContainer(type: MPMaskType.FLOAT32_ARRAY): Float32Array|undefined;
private getContainer(type: MPMaskType.WEBGL_TEXTURE): WebGLTexture|undefined;
private getContainer(type: MPMaskType): MPMaskContainer|undefined;
/** Returns the container for the requested storage type iff it exists. */
private getContainer(type: MPMaskType): MPMaskContainer|undefined {
switch (type) {
case MPMaskType.UINT8_ARRAY:
return this.containers.find(img => img instanceof Uint8Array);
case MPMaskType.FLOAT32_ARRAY:
return this.containers.find(img => img instanceof Float32Array);
case MPMaskType.WEBGL_TEXTURE:
return this.containers.find(
img => typeof WebGLTexture !== 'undefined' &&
img instanceof WebGLTexture);
default:
throw new Error(`Type is not supported: ${type}`);
}
}
/**
* Creates a copy of the resources stored in this `MPMask`. You can
* invoke this method to extend the lifetime of a mask returned by a
* MediaPipe Task. Note that performance critical applications should aim to
* only use the `MPMask` within the MediaPipe Task callback so that
* copies can be avoided.
*/
clone(): MPMask {
const destinationContainers: MPMaskContainer[] = [];
// TODO: We might only want to clone one backing datastructure
// even if multiple are defined;
for (const container of this.containers) {
let destinationContainer: MPMaskContainer;
if (container instanceof Uint8Array) {
destinationContainer = new Uint8Array(container);
} else if (container instanceof Float32Array) {
destinationContainer = new Float32Array(container);
} else if (container instanceof WebGLTexture) {
const gl = this.getGL();
const shaderContext = this.getShaderContext();
// Create a new texture and use it to back a framebuffer
gl.activeTexture(gl.TEXTURE1);
destinationContainer =
assertNotNull(gl.createTexture(), 'Failed to create texture');
gl.bindTexture(gl.TEXTURE_2D, destinationContainer);
gl.texImage2D(
gl.TEXTURE_2D, 0, gl.R32F, this.width, this.height, 0, gl.RED,
gl.FLOAT, null);
gl.bindTexture(gl.TEXTURE_2D, null);
shaderContext.bindFramebuffer(gl, destinationContainer);
shaderContext.run(gl, /* flipVertically= */ false, () => {
this.bindTexture(); // This activates gl.TEXTURE0
gl.clearColor(0, 0, 0, 0);
gl.clear(gl.COLOR_BUFFER_BIT);
gl.drawArrays(gl.TRIANGLE_FAN, 0, 4);
this.unbindTexture();
});
shaderContext.unbindFramebuffer();
this.unbindTexture();
} else {
throw new Error(`Type is not supported: ${container}`);
}
destinationContainers.push(destinationContainer);
}
return new MPMask(
destinationContainers, this.hasWebGLTexture(), this.canvas,
this.shaderContext, this.width, this.height);
}
private getGL(): WebGL2RenderingContext {
if (!this.canvas) {
throw new Error(
'Conversion to different image formats require that a canvas ' +
'is passed when iniitializing the image.');
}
if (!this.gl) {
this.gl = assertNotNull(
this.canvas.getContext('webgl2') as WebGL2RenderingContext | null,
'You cannot use a canvas that is already bound to a different ' +
'type of rendering context.');
}
const ext = this.gl.getExtension('EXT_color_buffer_float');
if (!ext) {
// TODO: Ensure this works on iOS
throw new Error('Missing required EXT_color_buffer_float extension');
}
return this.gl;
}
private getShaderContext(): MPImageShaderContext {
if (!this.shaderContext) {
this.shaderContext = new MPImageShaderContext();
}
return this.shaderContext;
}
private convertToFloat32Array(): Float32Array {
let float32Array = this.getContainer(MPMaskType.FLOAT32_ARRAY);
if (!float32Array) {
const uint8Array = this.getContainer(MPMaskType.UINT8_ARRAY);
if (uint8Array) {
float32Array = new Float32Array(uint8Array).map(v => v / 255);
} else {
const gl = this.getGL();
const shaderContext = this.getShaderContext();
float32Array = new Float32Array(this.width * this.height);
// Create texture if needed
const webGlTexture = this.convertToWebGLTexture();
// Create a framebuffer from the texture and read back pixels
shaderContext.bindFramebuffer(gl, webGlTexture);
gl.readPixels(
0, 0, this.width, this.height, gl.RED, gl.FLOAT, float32Array);
shaderContext.unbindFramebuffer();
}
this.containers.push(float32Array);
}
return float32Array;
}
private convertToUint8Array(): Uint8Array {
let uint8Array = this.getContainer(MPMaskType.UINT8_ARRAY);
if (!uint8Array) {
const floatArray = this.convertToFloat32Array();
uint8Array = new Uint8Array(floatArray.map(v => 255 * v));
this.containers.push(uint8Array);
}
return uint8Array;
}
private convertToWebGLTexture(): WebGLTexture {
let webGLTexture = this.getContainer(MPMaskType.WEBGL_TEXTURE);
if (!webGLTexture) {
const gl = this.getGL();
webGLTexture = this.bindTexture();
const data = this.convertToFloat32Array();
// TODO: Add support for R16F to support iOS
gl.texImage2D(
gl.TEXTURE_2D, 0, gl.R32F, this.width, this.height, 0, gl.RED,
gl.FLOAT, data);
this.unbindTexture();
}
return webGLTexture;
}
/**
* Binds the backing texture to the canvas. If the texture does not yet
* exist, creates it first.
*/
private bindTexture(): WebGLTexture {
const gl = this.getGL();
gl.viewport(0, 0, this.width, this.height);
gl.activeTexture(gl.TEXTURE0);
let webGLTexture = this.getContainer(MPMaskType.WEBGL_TEXTURE);
if (!webGLTexture) {
webGLTexture =
assertNotNull(gl.createTexture(), 'Failed to create texture');
this.containers.push(webGLTexture);
this.ownsWebGLTexture = true;
}
gl.bindTexture(gl.TEXTURE_2D, webGLTexture);
// TODO: Ideally, we would only set these once per texture and
// not once every frame.
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
return webGLTexture;
}
private unbindTexture(): void {
this.gl!.bindTexture(this.gl!.TEXTURE_2D, null);
}
/**
* Frees up any resources owned by this `MPMask` instance.
*
* Note that this method does not free masks that are owned by the C++
* Task, as these are freed automatically once you leave the MediaPipe
* callback. Additionally, some shared state is freed only once you invoke
* the Task's `close()` method.
*/
close(): void {
if (this.ownsWebGLTexture) {
const gl = this.getGL();
gl.deleteTexture(this.getContainer(MPMaskType.WEBGL_TEXTURE)!);
}
}
}

View File

@ -16,8 +16,6 @@
* limitations under the License. * limitations under the License.
*/ */
import {MPImageChannelConverter} from '../../../../tasks/web/vision/core/image';
// Pre-baked color table for a maximum of 12 classes. // Pre-baked color table for a maximum of 12 classes.
const CM_ALPHA = 128; const CM_ALPHA = 128;
const COLOR_MAP: Array<[number, number, number, number]> = [ const COLOR_MAP: Array<[number, number, number, number]> = [
@ -35,8 +33,37 @@ const COLOR_MAP: Array<[number, number, number, number]> = [
[255, 255, 255, CM_ALPHA] // class 11 is white; could do black instead? [255, 255, 255, CM_ALPHA] // class 11 is white; could do black instead?
]; ];
/** The color converter we use in our demos. */
export const RENDER_UTIL_CONVERTER: MPImageChannelConverter = { /** Helper function to draw a confidence mask */
floatToRGBAConverter: v => [128, 0, 0, v * 255], export function drawConfidenceMask(
uint8ToRGBAConverter: v => COLOR_MAP[v % COLOR_MAP.length], ctx: CanvasRenderingContext2D, image: Float32Array, width: number,
}; height: number): void {
const uint8Array = new Uint8ClampedArray(width * height * 4);
for (let i = 0; i < image.length; i++) {
uint8Array[4 * i] = 128;
uint8Array[4 * i + 1] = 0;
uint8Array[4 * i + 2] = 0;
uint8Array[4 * i + 3] = image[i] * 255;
}
ctx.putImageData(new ImageData(uint8Array, width, height), 0, 0);
}
/**
* Helper function to draw a category mask. For GPU, we only have F32Arrays
* for now.
*/
export function drawCategoryMask(
ctx: CanvasRenderingContext2D, image: Uint8Array|Float32Array,
width: number, height: number): void {
const rgbaArray = new Uint8ClampedArray(width * height * 4);
const isFloatArray = image instanceof Float32Array;
for (let i = 0; i < image.length; i++) {
const colorIndex = isFloatArray ? Math.round(image[i] * 255) : image[i];
const color = COLOR_MAP[colorIndex % COLOR_MAP.length];
rgbaArray[4 * i] = color[0];
rgbaArray[4 * i + 1] = color[1];
rgbaArray[4 * i + 2] = color[2];
rgbaArray[4 * i + 3] = color[3];
}
ctx.putImageData(new ImageData(rgbaArray, width, height), 0, 0);
}

View File

@ -19,7 +19,10 @@ import {NormalizedKeypoint} from '../../../../tasks/web/components/containers/ke
/** A Region-Of-Interest (ROI) to represent a region within an image. */ /** A Region-Of-Interest (ROI) to represent a region within an image. */
export declare interface RegionOfInterest { export declare interface RegionOfInterest {
/** The ROI in keypoint format. */ /** The ROI in keypoint format. */
keypoint: NormalizedKeypoint; keypoint?: NormalizedKeypoint;
/** The ROI as scribbles over the object that the user wants to segment. */
scribble?: NormalizedKeypoint[];
} }
/** A connection between two landmarks. */ /** A connection between two landmarks. */

View File

@ -17,8 +17,10 @@
import {NormalizedRect} from '../../../../framework/formats/rect_pb'; import {NormalizedRect} from '../../../../framework/formats/rect_pb';
import {TaskRunner} from '../../../../tasks/web/core/task_runner'; import {TaskRunner} from '../../../../tasks/web/core/task_runner';
import {WasmFileset} from '../../../../tasks/web/core/wasm_fileset'; import {WasmFileset} from '../../../../tasks/web/core/wasm_fileset';
import {MPImage, MPImageShaderContext} from '../../../../tasks/web/vision/core/image'; import {MPImage} from '../../../../tasks/web/vision/core/image';
import {ImageProcessingOptions} from '../../../../tasks/web/vision/core/image_processing_options'; import {ImageProcessingOptions} from '../../../../tasks/web/vision/core/image_processing_options';
import {MPImageShaderContext} from '../../../../tasks/web/vision/core/image_shader_context';
import {MPMask} from '../../../../tasks/web/vision/core/mask';
import {GraphRunner, ImageSource, WasmMediaPipeConstructor} from '../../../../web/graph_runner/graph_runner'; import {GraphRunner, ImageSource, WasmMediaPipeConstructor} from '../../../../web/graph_runner/graph_runner';
import {SupportImage, WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib'; import {SupportImage, WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib';
import {isWebKit} from '../../../../web/graph_runner/platform_utils'; import {isWebKit} from '../../../../web/graph_runner/platform_utils';
@ -57,11 +59,6 @@ export abstract class VisionTaskRunner extends TaskRunner {
protected static async createVisionInstance<T extends VisionTaskRunner>( protected static async createVisionInstance<T extends VisionTaskRunner>(
type: WasmMediaPipeConstructor<T>, fileset: WasmFileset, type: WasmMediaPipeConstructor<T>, fileset: WasmFileset,
options: VisionTaskOptions): Promise<T> { options: VisionTaskOptions): Promise<T> {
if (options.baseOptions?.delegate === 'GPU') {
if (!options.canvas) {
throw new Error('You must specify a canvas for GPU processing.');
}
}
const canvas = options.canvas ?? createCanvas(); const canvas = options.canvas ?? createCanvas();
return TaskRunner.createInstance(type, canvas, fileset, options); return TaskRunner.createInstance(type, canvas, fileset, options);
} }
@ -225,19 +222,18 @@ export abstract class VisionTaskRunner extends TaskRunner {
/** /**
* Converts a WasmImage to an MPImage. * Converts a WasmImage to an MPImage.
* *
* Converts the underlying Uint8ClampedArray-backed images to ImageData * Converts the underlying Uint8Array-backed images to ImageData
* (adding an alpha channel if necessary), passes through WebGLTextures and * (adding an alpha channel if necessary), passes through WebGLTextures and
* throws for Float32Array-backed images. * throws for Float32Array-backed images.
*/ */
protected convertToMPImage(wasmImage: WasmImage): MPImage { protected convertToMPImage(wasmImage: WasmImage, shouldCopyData: boolean):
MPImage {
const {data, width, height} = wasmImage; const {data, width, height} = wasmImage;
const pixels = width * height; const pixels = width * height;
let container: ImageData|WebGLTexture|Uint8ClampedArray; let container: ImageData|WebGLTexture;
if (data instanceof Uint8ClampedArray) { if (data instanceof Uint8Array) {
if (data.length === pixels) { if (data.length === pixels * 3) {
container = data; // Mask
} else if (data.length === pixels * 3) {
// TODO: Convert in C++ // TODO: Convert in C++
const rgba = new Uint8ClampedArray(pixels * 4); const rgba = new Uint8ClampedArray(pixels * 4);
for (let i = 0; i < pixels; ++i) { for (let i = 0; i < pixels; ++i) {
@ -247,25 +243,48 @@ export abstract class VisionTaskRunner extends TaskRunner {
rgba[4 * i + 3] = 255; rgba[4 * i + 3] = 255;
} }
container = new ImageData(rgba, width, height); container = new ImageData(rgba, width, height);
} else if (data.length ===pixels * 4) { } else if (data.length === pixels * 4) {
container = new ImageData(data, width, height); container = new ImageData(
new Uint8ClampedArray(data.buffer, data.byteOffset, data.length),
width, height);
} else { } else {
throw new Error(`Unsupported channel count: ${data.length/pixels}`); throw new Error(`Unsupported channel count: ${data.length/pixels}`);
} }
} else if (data instanceof Float32Array) { } else if (data instanceof WebGLTexture) {
container = data;
} else {
throw new Error(`Unsupported format: ${data.constructor.name}`);
}
const image = new MPImage(
[container], /* ownsImageBitmap= */ false,
/* ownsWebGLTexture= */ false, this.graphRunner.wasmModule.canvas!,
this.shaderContext, width, height);
return shouldCopyData ? image.clone() : image;
}
/** Converts a WasmImage to an MPMask. */
protected convertToMPMask(wasmImage: WasmImage, shouldCopyData: boolean):
MPMask {
const {data, width, height} = wasmImage;
const pixels = width * height;
let container: WebGLTexture|Uint8Array|Float32Array;
if (data instanceof Uint8Array || data instanceof Float32Array) {
if (data.length === pixels) { if (data.length === pixels) {
container = data; // Mask container = data;
} else { } else {
throw new Error(`Unsupported channel count: ${data.length/pixels}`); throw new Error(`Unsupported channel count: ${data.length / pixels}`);
} }
} else { // WebGLTexture } else {
container = data; container = data;
} }
return new MPImage( const mask = new MPMask(
[container], /* ownsImageBitmap= */ false, /* ownsWebGLTexture= */ false, [container],
this.graphRunner.wasmModule.canvas!, this.shaderContext, width, /* ownsWebGLTexture= */ false, this.graphRunner.wasmModule.canvas!,
height); this.shaderContext, width, height);
return shouldCopyData ? mask.clone() : mask;
} }
/** Closes and cleans up the resources held by this task. */ /** Closes and cleans up the resources held by this task. */

View File

@ -47,7 +47,6 @@ mediapipe_ts_library(
"//mediapipe/framework:calculator_jspb_proto", "//mediapipe/framework:calculator_jspb_proto",
"//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core",
"//mediapipe/tasks/web/core:task_runner_test_utils", "//mediapipe/tasks/web/core:task_runner_test_utils",
"//mediapipe/tasks/web/vision/core:image",
"//mediapipe/web/graph_runner:graph_runner_image_lib_ts", "//mediapipe/web/graph_runner:graph_runner_image_lib_ts",
], ],
) )

View File

@ -50,7 +50,8 @@ export type FaceStylizerCallback = (image: MPImage|null) => void;
/** Performs face stylization on images. */ /** Performs face stylization on images. */
export class FaceStylizer extends VisionTaskRunner { export class FaceStylizer extends VisionTaskRunner {
private userCallback: FaceStylizerCallback = () => {}; private userCallback?: FaceStylizerCallback;
private result?: MPImage|null;
private readonly options: FaceStylizerGraphOptionsProto; private readonly options: FaceStylizerGraphOptionsProto;
/** /**
@ -130,21 +131,58 @@ export class FaceStylizer extends VisionTaskRunner {
return super.applyOptions(options); return super.applyOptions(options);
} }
/** /**
* Performs face stylization on the provided single image. The method returns * Performs face stylization on the provided single image and invokes the
* synchronously once the callback returns. Only use this method when the * callback with result. The method returns synchronously once the callback
* FaceStylizer is created with the image running mode. * returns. Only use this method when the FaceStylizer is created with the
* image running mode.
* *
* @param image An image to process. * @param image An image to process.
* @param callback The callback that is invoked with the stylized image. The * @param callback The callback that is invoked with the stylized image or
* lifetime of the returned data is only guaranteed for the duration of the * `null` if no face was detected. The lifetime of the returned data is
* callback. * only guaranteed for the duration of the callback.
*/ */
stylize(image: ImageSource, callback: FaceStylizerCallback): void; stylize(image: ImageSource, callback: FaceStylizerCallback): void;
/** /**
* Performs face stylization on the provided single image. The method returns * Performs face stylization on the provided single image and invokes the
* synchronously once the callback returns. Only use this method when the * callback with result. The method returns synchronously once the callback
* returns. Only use this method when the FaceStylizer is created with the
* image running mode.
*
* The 'imageProcessingOptions' parameter can be used to specify one or all
* of:
* - the rotation to apply to the image before performing stylization, by
* setting its 'rotationDegrees' property.
* - the region-of-interest on which to perform stylization, by setting its
* 'regionOfInterest' property. If not specified, the full image is used.
* If both are specified, the crop around the region-of-interest is extracted
* first, then the specified rotation is applied to the crop.
*
* @param image An image to process.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how
* to process the input image before running inference.
* @param callback The callback that is invoked with the stylized image or
* `null` if no face was detected. The lifetime of the returned data is
* only guaranteed for the duration of the callback.
*/
stylize(
image: ImageSource, imageProcessingOptions: ImageProcessingOptions,
callback: FaceStylizerCallback): void;
/**
* Performs face stylization on the provided single image and returns the
* result. This method creates a copy of the resulting image and should not be
* used in high-throughput applictions. Only use this method when the
* FaceStylizer is created with the image running mode.
*
* @param image An image to process.
* @return A stylized face or `null` if no face was detected. The result is
* copied to avoid lifetime issues.
*/
stylize(image: ImageSource): MPImage|null;
/**
* Performs face stylization on the provided single image and returns the
* result. This method creates a copy of the resulting image and should not be
* used in high-throughput applictions. Only use this method when the
* FaceStylizer is created with the image running mode. * FaceStylizer is created with the image running mode.
* *
* The 'imageProcessingOptions' parameter can be used to specify one or all * The 'imageProcessingOptions' parameter can be used to specify one or all
@ -159,18 +197,16 @@ export class FaceStylizer extends VisionTaskRunner {
* @param image An image to process. * @param image An image to process.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how * @param imageProcessingOptions the `ImageProcessingOptions` specifying how
* to process the input image before running inference. * to process the input image before running inference.
* @param callback The callback that is invoked with the stylized image. The * @return A stylized face or `null` if no face was detected. The result is
* lifetime of the returned data is only guaranteed for the duration of the * copied to avoid lifetime issues.
* callback.
*/ */
stylize( stylize(image: ImageSource, imageProcessingOptions: ImageProcessingOptions):
image: ImageSource, imageProcessingOptions: ImageProcessingOptions, MPImage|null;
callback: FaceStylizerCallback): void;
stylize( stylize(
image: ImageSource, image: ImageSource,
imageProcessingOptionsOrCallback: ImageProcessingOptions| imageProcessingOptionsOrCallback?: ImageProcessingOptions|
FaceStylizerCallback, FaceStylizerCallback,
callback?: FaceStylizerCallback): void { callback?: FaceStylizerCallback): MPImage|null|void {
const imageProcessingOptions = const imageProcessingOptions =
typeof imageProcessingOptionsOrCallback !== 'function' ? typeof imageProcessingOptionsOrCallback !== 'function' ?
imageProcessingOptionsOrCallback : imageProcessingOptionsOrCallback :
@ -178,14 +214,19 @@ export class FaceStylizer extends VisionTaskRunner {
this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ? this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ?
imageProcessingOptionsOrCallback : imageProcessingOptionsOrCallback :
callback!; callback;
this.processImageData(image, imageProcessingOptions ?? {}); this.processImageData(image, imageProcessingOptions ?? {});
this.userCallback = () => {};
if (!this.userCallback) {
return this.result;
}
} }
/** /**
* Performs face stylization on the provided video frame. Only use this method * Performs face stylization on the provided video frame and invokes the
* when the FaceStylizer is created with the video running mode. * callback with result. The method returns synchronously once the callback
* returns. Only use this method when the FaceStylizer is created with the
* video running mode.
* *
* The input frame can be of any size. It's required to provide the video * The input frame can be of any size. It's required to provide the video
* frame's timestamp (in milliseconds). The input timestamps must be * frame's timestamp (in milliseconds). The input timestamps must be
@ -193,16 +234,18 @@ export class FaceStylizer extends VisionTaskRunner {
* *
* @param videoFrame A video frame to process. * @param videoFrame A video frame to process.
* @param timestamp The timestamp of the current frame, in ms. * @param timestamp The timestamp of the current frame, in ms.
* @param callback The callback that is invoked with the stylized image. The * @param callback The callback that is invoked with the stylized image or
* lifetime of the returned data is only guaranteed for the duration of * `null` if no face was detected. The lifetime of the returned data is only
* the callback. * guaranteed for the duration of the callback.
*/ */
stylizeForVideo( stylizeForVideo(
videoFrame: ImageSource, timestamp: number, videoFrame: ImageSource, timestamp: number,
callback: FaceStylizerCallback): void; callback: FaceStylizerCallback): void;
/** /**
* Performs face stylization on the provided video frame. Only use this * Performs face stylization on the provided video frame and invokes the
* method when the FaceStylizer is created with the video running mode. * callback with result. The method returns synchronously once the callback
* returns. Only use this method when the FaceStylizer is created with the
* video running mode.
* *
* The 'imageProcessingOptions' parameter can be used to specify one or all * The 'imageProcessingOptions' parameter can be used to specify one or all
* of: * of:
@ -218,34 +261,83 @@ export class FaceStylizer extends VisionTaskRunner {
* monotonically increasing. * monotonically increasing.
* *
* @param videoFrame A video frame to process. * @param videoFrame A video frame to process.
* @param timestamp The timestamp of the current frame, in ms.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how * @param imageProcessingOptions the `ImageProcessingOptions` specifying how
* to process the input image before running inference. * to process the input image before running inference.
* @param timestamp The timestamp of the current frame, in ms. * @param callback The callback that is invoked with the stylized image or
* @param callback The callback that is invoked with the stylized image. The * `null` if no face was detected. The lifetime of the returned data is only
* lifetime of the returned data is only guaranteed for the duration of * guaranteed for the duration of the callback.
* the callback.
*/ */
stylizeForVideo( stylizeForVideo(
videoFrame: ImageSource, imageProcessingOptions: ImageProcessingOptions, videoFrame: ImageSource, timestamp: number,
timestamp: number, callback: FaceStylizerCallback): void; imageProcessingOptions: ImageProcessingOptions,
callback: FaceStylizerCallback): void;
/**
* Performs face stylization on the provided video frame. This method creates
* a copy of the resulting image and should not be used in high-throughput
* applictions. Only use this method when the FaceStylizer is created with the
* video running mode.
*
* The input frame can be of any size. It's required to provide the video
* frame's timestamp (in milliseconds). The input timestamps must be
* monotonically increasing.
*
* @param videoFrame A video frame to process.
* @param timestamp The timestamp of the current frame, in ms.
* @return A stylized face or `null` if no face was detected. The result is
* copied to avoid lifetime issues.
*/
stylizeForVideo(videoFrame: ImageSource, timestamp: number): MPImage|null;
/**
* Performs face stylization on the provided video frame. This method creates
* a copy of the resulting image and should not be used in high-throughput
* applictions. Only use this method when the FaceStylizer is created with the
* video running mode.
*
* The 'imageProcessingOptions' parameter can be used to specify one or all
* of:
* - the rotation to apply to the image before performing stylization, by
* setting its 'rotationDegrees' property.
* - the region-of-interest on which to perform stylization, by setting its
* 'regionOfInterest' property. If not specified, the full image is used.
* If both are specified, the crop around the region-of-interest is
* extracted first, then the specified rotation is applied to the crop.
*
* The input frame can be of any size. It's required to provide the video
* frame's timestamp (in milliseconds). The input timestamps must be
* monotonically increasing.
*
* @param videoFrame A video frame to process.
* @param timestamp The timestamp of the current frame, in ms.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how
* to process the input image before running inference.
* @return A stylized face or `null` if no face was detected. The result is
* copied to avoid lifetime issues.
*/
stylizeForVideo( stylizeForVideo(
videoFrame: ImageSource, videoFrame: ImageSource,
timestampOrImageProcessingOptions: number|ImageProcessingOptions, timestamp: number,
timestampOrCallback: number|FaceStylizerCallback, imageProcessingOptions: ImageProcessingOptions,
callback?: FaceStylizerCallback): void { ): MPImage|null;
stylizeForVideo(
videoFrame: ImageSource, timestamp: number,
imageProcessingOptionsOrCallback?: ImageProcessingOptions|
FaceStylizerCallback,
callback?: FaceStylizerCallback): MPImage|null|void {
const imageProcessingOptions = const imageProcessingOptions =
typeof timestampOrImageProcessingOptions !== 'number' ? typeof imageProcessingOptionsOrCallback !== 'function' ?
timestampOrImageProcessingOptions : imageProcessingOptionsOrCallback :
{}; {};
const timestamp = typeof timestampOrImageProcessingOptions === 'number' ?
timestampOrImageProcessingOptions :
timestampOrCallback as number;
this.userCallback = typeof timestampOrCallback === 'function' ? this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ?
timestampOrCallback : imageProcessingOptionsOrCallback :
callback!; callback;
this.processVideoData(videoFrame, imageProcessingOptions, timestamp); this.processVideoData(videoFrame, imageProcessingOptions, timestamp);
this.userCallback = () => {}; this.userCallback = undefined;
if (!this.userCallback) {
return this.result;
}
} }
/** Updates the MediaPipe graph configuration. */ /** Updates the MediaPipe graph configuration. */
@ -270,13 +362,20 @@ export class FaceStylizer extends VisionTaskRunner {
this.graphRunner.attachImageListener( this.graphRunner.attachImageListener(
STYLIZED_IMAGE_STREAM, (wasmImage, timestamp) => { STYLIZED_IMAGE_STREAM, (wasmImage, timestamp) => {
const mpImage = this.convertToMPImage(wasmImage); const mpImage = this.convertToMPImage(
wasmImage, /* shouldCopyData= */ !this.userCallback);
this.result = mpImage;
if (this.userCallback) {
this.userCallback(mpImage); this.userCallback(mpImage);
}
this.setLatestOutputTimestamp(timestamp); this.setLatestOutputTimestamp(timestamp);
}); });
this.graphRunner.attachEmptyPacketListener( this.graphRunner.attachEmptyPacketListener(
STYLIZED_IMAGE_STREAM, timestamp => { STYLIZED_IMAGE_STREAM, timestamp => {
this.result = null;
if (this.userCallback) {
this.userCallback(null); this.userCallback(null);
}
this.setLatestOutputTimestamp(timestamp); this.setLatestOutputTimestamp(timestamp);
}); });

View File

@ -19,7 +19,6 @@ import 'jasmine';
// Placeholder for internal dependency on encodeByteArray // Placeholder for internal dependency on encodeByteArray
import {CalculatorGraphConfig} from '../../../../framework/calculator_pb'; import {CalculatorGraphConfig} from '../../../../framework/calculator_pb';
import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph, verifyListenersRegistered} from '../../../../tasks/web/core/task_runner_test_utils'; import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph, verifyListenersRegistered} from '../../../../tasks/web/core/task_runner_test_utils';
import {MPImage} from '../../../../tasks/web/vision/core/image';
import {WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib'; import {WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib';
import {FaceStylizer} from './face_stylizer'; import {FaceStylizer} from './face_stylizer';
@ -99,6 +98,30 @@ describe('FaceStylizer', () => {
]); ]);
}); });
it('returns result', () => {
if (typeof ImageData === 'undefined') {
console.log('ImageData tests are not supported on Node');
return;
}
// Pass the test data to our listener
faceStylizer.fakeWasmModule._waitUntilIdle.and.callFake(() => {
verifyListenersRegistered(faceStylizer);
faceStylizer.imageListener!
({data: new Uint8Array([1, 1, 1, 1]), width: 1, height: 1},
/* timestamp= */ 1337);
});
// Invoke the face stylizeer
const image = faceStylizer.stylize({} as HTMLImageElement);
expect(faceStylizer.fakeWasmModule._waitUntilIdle).toHaveBeenCalled();
expect(image).not.toBeNull();
expect(image!.hasImageData()).toBeTrue();
expect(image!.width).toEqual(1);
expect(image!.height).toEqual(1);
image!.close();
});
it('invokes callback', (done) => { it('invokes callback', (done) => {
if (typeof ImageData === 'undefined') { if (typeof ImageData === 'undefined') {
console.log('ImageData tests are not supported on Node'); console.log('ImageData tests are not supported on Node');
@ -110,7 +133,7 @@ describe('FaceStylizer', () => {
faceStylizer.fakeWasmModule._waitUntilIdle.and.callFake(() => { faceStylizer.fakeWasmModule._waitUntilIdle.and.callFake(() => {
verifyListenersRegistered(faceStylizer); verifyListenersRegistered(faceStylizer);
faceStylizer.imageListener! faceStylizer.imageListener!
({data: new Uint8ClampedArray([1, 1, 1, 1]), width: 1, height: 1}, ({data: new Uint8Array([1, 1, 1, 1]), width: 1, height: 1},
/* timestamp= */ 1337); /* timestamp= */ 1337);
}); });
@ -118,35 +141,14 @@ describe('FaceStylizer', () => {
faceStylizer.stylize({} as HTMLImageElement, image => { faceStylizer.stylize({} as HTMLImageElement, image => {
expect(faceStylizer.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); expect(faceStylizer.fakeWasmModule._waitUntilIdle).toHaveBeenCalled();
expect(image).not.toBeNull(); expect(image).not.toBeNull();
expect(image!.has(MPImage.TYPE.IMAGE_DATA)).toBeTrue(); expect(image!.hasImageData()).toBeTrue();
expect(image!.width).toEqual(1); expect(image!.width).toEqual(1);
expect(image!.height).toEqual(1); expect(image!.height).toEqual(1);
done(); done();
}); });
}); });
it('invokes callback even when no faes are detected', (done) => { it('invokes callback even when no faces are detected', (done) => {
if (typeof ImageData === 'undefined') {
console.log('ImageData tests are not supported on Node');
done();
return;
}
// Pass the test data to our listener
faceStylizer.fakeWasmModule._waitUntilIdle.and.callFake(() => {
verifyListenersRegistered(faceStylizer);
faceStylizer.emptyPacketListener!(/* timestamp= */ 1337);
});
// Invoke the face stylizeer
faceStylizer.stylize({} as HTMLImageElement, image => {
expect(faceStylizer.fakeWasmModule._waitUntilIdle).toHaveBeenCalled();
expect(image).toBeNull();
done();
});
});
it('invokes callback even when no faes are detected', (done) => {
// Pass the test data to our listener // Pass the test data to our listener
faceStylizer.fakeWasmModule._waitUntilIdle.and.callFake(() => { faceStylizer.fakeWasmModule._waitUntilIdle.and.callFake(() => {
verifyListenersRegistered(faceStylizer); verifyListenersRegistered(faceStylizer);

View File

@ -35,7 +35,7 @@ mediapipe_ts_declaration(
deps = [ deps = [
"//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core",
"//mediapipe/tasks/web/core:classifier_options", "//mediapipe/tasks/web/core:classifier_options",
"//mediapipe/tasks/web/vision/core:image", "//mediapipe/tasks/web/vision/core:mask",
"//mediapipe/tasks/web/vision/core:vision_task_options", "//mediapipe/tasks/web/vision/core:vision_task_options",
], ],
) )
@ -52,7 +52,7 @@ mediapipe_ts_library(
"//mediapipe/framework:calculator_jspb_proto", "//mediapipe/framework:calculator_jspb_proto",
"//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core",
"//mediapipe/tasks/web/core:task_runner_test_utils", "//mediapipe/tasks/web/core:task_runner_test_utils",
"//mediapipe/tasks/web/vision/core:image", "//mediapipe/tasks/web/vision/core:mask",
"//mediapipe/web/graph_runner:graph_runner_image_lib_ts", "//mediapipe/web/graph_runner:graph_runner_image_lib_ts",
], ],
) )

View File

@ -60,7 +60,7 @@ export type ImageSegmenterCallback = (result: ImageSegmenterResult) => void;
export class ImageSegmenter extends VisionTaskRunner { export class ImageSegmenter extends VisionTaskRunner {
private result: ImageSegmenterResult = {}; private result: ImageSegmenterResult = {};
private labels: string[] = []; private labels: string[] = [];
private userCallback: ImageSegmenterCallback = () => {}; private userCallback?: ImageSegmenterCallback;
private outputCategoryMask = DEFAULT_OUTPUT_CATEGORY_MASK; private outputCategoryMask = DEFAULT_OUTPUT_CATEGORY_MASK;
private outputConfidenceMasks = DEFAULT_OUTPUT_CONFIDENCE_MASKS; private outputConfidenceMasks = DEFAULT_OUTPUT_CONFIDENCE_MASKS;
private readonly options: ImageSegmenterGraphOptionsProto; private readonly options: ImageSegmenterGraphOptionsProto;
@ -224,22 +224,51 @@ export class ImageSegmenter extends VisionTaskRunner {
segment( segment(
image: ImageSource, imageProcessingOptions: ImageProcessingOptions, image: ImageSource, imageProcessingOptions: ImageProcessingOptions,
callback: ImageSegmenterCallback): void; callback: ImageSegmenterCallback): void;
/**
* Performs image segmentation on the provided single image and returns the
* segmentation result. This method creates a copy of the resulting masks and
* should not be used in high-throughput applictions. Only use this method
* when the ImageSegmenter is created with running mode `image`.
*
* @param image An image to process.
* @return The segmentation result. The data is copied to avoid lifetime
* issues.
*/
segment(image: ImageSource): ImageSegmenterResult;
/**
* Performs image segmentation on the provided single image and returns the
* segmentation result. This method creates a copy of the resulting masks and
* should not be used in high-v applictions. Only use this method when
* the ImageSegmenter is created with running mode `image`.
*
* @param image An image to process.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how
* to process the input image before running inference.
* @return The segmentation result. The data is copied to avoid lifetime
* issues.
*/
segment(image: ImageSource, imageProcessingOptions: ImageProcessingOptions):
ImageSegmenterResult;
segment( segment(
image: ImageSource, image: ImageSource,
imageProcessingOptionsOrCallback: ImageProcessingOptions| imageProcessingOptionsOrCallback?: ImageProcessingOptions|
ImageSegmenterCallback, ImageSegmenterCallback,
callback?: ImageSegmenterCallback): void { callback?: ImageSegmenterCallback): ImageSegmenterResult|void {
const imageProcessingOptions = const imageProcessingOptions =
typeof imageProcessingOptionsOrCallback !== 'function' ? typeof imageProcessingOptionsOrCallback !== 'function' ?
imageProcessingOptionsOrCallback : imageProcessingOptionsOrCallback :
{}; {};
this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ? this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ?
imageProcessingOptionsOrCallback : imageProcessingOptionsOrCallback :
callback!; callback;
this.reset(); this.reset();
this.processImageData(image, imageProcessingOptions); this.processImageData(image, imageProcessingOptions);
this.userCallback = () => {};
if (!this.userCallback) {
return this.result;
}
} }
/** /**
@ -264,35 +293,64 @@ export class ImageSegmenter extends VisionTaskRunner {
* created with running mode `video`. * created with running mode `video`.
* *
* @param videoFrame A video frame to process. * @param videoFrame A video frame to process.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how
* to process the input image before running inference.
* @param timestamp The timestamp of the current frame, in ms. * @param timestamp The timestamp of the current frame, in ms.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how
* to process the input frame before running inference.
* @param callback The callback that is invoked with the segmented masks. The * @param callback The callback that is invoked with the segmented masks. The
* lifetime of the returned data is only guaranteed for the duration of the * lifetime of the returned data is only guaranteed for the duration of the
* callback. * callback.
*/ */
segmentForVideo( segmentForVideo(
videoFrame: ImageSource, imageProcessingOptions: ImageProcessingOptions, videoFrame: ImageSource, timestamp: number,
timestamp: number, callback: ImageSegmenterCallback): void; imageProcessingOptions: ImageProcessingOptions,
callback: ImageSegmenterCallback): void;
/**
* Performs image segmentation on the provided video frame and returns the
* segmentation result. This method creates a copy of the resulting masks and
* should not be used in high-throughput applictions. Only use this method
* when the ImageSegmenter is created with running mode `video`.
*
* @param videoFrame A video frame to process.
* @return The segmentation result. The data is copied to avoid lifetime
* issues.
*/
segmentForVideo(videoFrame: ImageSource, timestamp: number):
ImageSegmenterResult;
/**
* Performs image segmentation on the provided video frame and returns the
* segmentation result. This method creates a copy of the resulting masks and
* should not be used in high-v applictions. Only use this method when
* the ImageSegmenter is created with running mode `video`.
*
* @param videoFrame A video frame to process.
* @param timestamp The timestamp of the current frame, in ms.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how
* to process the input frame before running inference.
* @return The segmentation result. The data is copied to avoid lifetime
* issues.
*/
segmentForVideo( segmentForVideo(
videoFrame: ImageSource, videoFrame: ImageSource, timestamp: number,
timestampOrImageProcessingOptions: number|ImageProcessingOptions, imageProcessingOptions: ImageProcessingOptions): ImageSegmenterResult;
timestampOrCallback: number|ImageSegmenterCallback, segmentForVideo(
callback?: ImageSegmenterCallback): void { videoFrame: ImageSource, timestamp: number,
imageProcessingOptionsOrCallback?: ImageProcessingOptions|
ImageSegmenterCallback,
callback?: ImageSegmenterCallback): ImageSegmenterResult|void {
const imageProcessingOptions = const imageProcessingOptions =
typeof timestampOrImageProcessingOptions !== 'number' ? typeof imageProcessingOptionsOrCallback !== 'function' ?
timestampOrImageProcessingOptions : imageProcessingOptionsOrCallback :
{}; {};
const timestamp = typeof timestampOrImageProcessingOptions === 'number' ? this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ?
timestampOrImageProcessingOptions : imageProcessingOptionsOrCallback :
timestampOrCallback as number; callback;
this.userCallback = typeof timestampOrCallback === 'function' ?
timestampOrCallback :
callback!;
this.reset(); this.reset();
this.processVideoData(videoFrame, imageProcessingOptions, timestamp); this.processVideoData(videoFrame, imageProcessingOptions, timestamp);
this.userCallback = () => {};
if (!this.userCallback) {
return this.result;
}
} }
/** /**
@ -323,8 +381,10 @@ export class ImageSegmenter extends VisionTaskRunner {
return; return;
} }
if (this.userCallback) {
this.userCallback(this.result); this.userCallback(this.result);
} }
}
/** Updates the MediaPipe graph configuration. */ /** Updates the MediaPipe graph configuration. */
protected override refreshGraph(): void { protected override refreshGraph(): void {
@ -351,8 +411,9 @@ export class ImageSegmenter extends VisionTaskRunner {
this.graphRunner.attachImageVectorListener( this.graphRunner.attachImageVectorListener(
CONFIDENCE_MASKS_STREAM, (masks, timestamp) => { CONFIDENCE_MASKS_STREAM, (masks, timestamp) => {
this.result.confidenceMasks = this.result.confidenceMasks = masks.map(
masks.map(wasmImage => this.convertToMPImage(wasmImage)); wasmImage => this.convertToMPMask(
wasmImage, /* shouldCopyData= */ !this.userCallback));
this.setLatestOutputTimestamp(timestamp); this.setLatestOutputTimestamp(timestamp);
this.maybeInvokeCallback(); this.maybeInvokeCallback();
}); });
@ -370,7 +431,8 @@ export class ImageSegmenter extends VisionTaskRunner {
this.graphRunner.attachImageListener( this.graphRunner.attachImageListener(
CATEGORY_MASK_STREAM, (mask, timestamp) => { CATEGORY_MASK_STREAM, (mask, timestamp) => {
this.result.categoryMask = this.convertToMPImage(mask); this.result.categoryMask = this.convertToMPMask(
mask, /* shouldCopyData= */ !this.userCallback);
this.setLatestOutputTimestamp(timestamp); this.setLatestOutputTimestamp(timestamp);
this.maybeInvokeCallback(); this.maybeInvokeCallback();
}); });

View File

@ -14,7 +14,7 @@
* limitations under the License. * limitations under the License.
*/ */
import {MPImage} from '../../../../tasks/web/vision/core/image'; import {MPMask} from '../../../../tasks/web/vision/core/mask';
/** The output result of ImageSegmenter. */ /** The output result of ImageSegmenter. */
export declare interface ImageSegmenterResult { export declare interface ImageSegmenterResult {
@ -23,12 +23,12 @@ export declare interface ImageSegmenterResult {
* `MPImage`s where, for each mask, each pixel represents the prediction * `MPImage`s where, for each mask, each pixel represents the prediction
* confidence, usually in the [0, 1] range. * confidence, usually in the [0, 1] range.
*/ */
confidenceMasks?: MPImage[]; confidenceMasks?: MPMask[];
/** /**
* A category mask represented as a `Uint8ClampedArray` or * A category mask represented as a `Uint8ClampedArray` or
* `WebGLTexture`-backed `MPImage` where each pixel represents the class which * `WebGLTexture`-backed `MPImage` where each pixel represents the class which
* the pixel in the original image was predicted to belong to. * the pixel in the original image was predicted to belong to.
*/ */
categoryMask?: MPImage; categoryMask?: MPMask;
} }

View File

@ -19,8 +19,8 @@ import 'jasmine';
// Placeholder for internal dependency on encodeByteArray // Placeholder for internal dependency on encodeByteArray
import {CalculatorGraphConfig} from '../../../../framework/calculator_pb'; import {CalculatorGraphConfig} from '../../../../framework/calculator_pb';
import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph} from '../../../../tasks/web/core/task_runner_test_utils'; import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph} from '../../../../tasks/web/core/task_runner_test_utils';
import {MPMask} from '../../../../tasks/web/vision/core/mask';
import {WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib'; import {WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib';
import {MPImage} from '../../../../tasks/web/vision/core/image';
import {ImageSegmenter} from './image_segmenter'; import {ImageSegmenter} from './image_segmenter';
import {ImageSegmenterOptions} from './image_segmenter_options'; import {ImageSegmenterOptions} from './image_segmenter_options';
@ -165,7 +165,7 @@ describe('ImageSegmenter', () => {
}); });
it('supports category mask', async () => { it('supports category mask', async () => {
const mask = new Uint8ClampedArray([1, 2, 3, 4]); const mask = new Uint8Array([1, 2, 3, 4]);
await imageSegmenter.setOptions( await imageSegmenter.setOptions(
{outputCategoryMask: true, outputConfidenceMasks: false}); {outputCategoryMask: true, outputConfidenceMasks: false});
@ -183,7 +183,7 @@ describe('ImageSegmenter', () => {
return new Promise<void>(resolve => { return new Promise<void>(resolve => {
imageSegmenter.segment({} as HTMLImageElement, result => { imageSegmenter.segment({} as HTMLImageElement, result => {
expect(imageSegmenter.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); expect(imageSegmenter.fakeWasmModule._waitUntilIdle).toHaveBeenCalled();
expect(result.categoryMask).toBeInstanceOf(MPImage); expect(result.categoryMask).toBeInstanceOf(MPMask);
expect(result.confidenceMasks).not.toBeDefined(); expect(result.confidenceMasks).not.toBeDefined();
expect(result.categoryMask!.width).toEqual(2); expect(result.categoryMask!.width).toEqual(2);
expect(result.categoryMask!.height).toEqual(2); expect(result.categoryMask!.height).toEqual(2);
@ -216,18 +216,18 @@ describe('ImageSegmenter', () => {
expect(imageSegmenter.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); expect(imageSegmenter.fakeWasmModule._waitUntilIdle).toHaveBeenCalled();
expect(result.categoryMask).not.toBeDefined(); expect(result.categoryMask).not.toBeDefined();
expect(result.confidenceMasks![0]).toBeInstanceOf(MPImage); expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask);
expect(result.confidenceMasks![0].width).toEqual(2); expect(result.confidenceMasks![0].width).toEqual(2);
expect(result.confidenceMasks![0].height).toEqual(2); expect(result.confidenceMasks![0].height).toEqual(2);
expect(result.confidenceMasks![1]).toBeInstanceOf(MPImage); expect(result.confidenceMasks![1]).toBeInstanceOf(MPMask);
resolve(); resolve();
}); });
}); });
}); });
it('supports combined category and confidence masks', async () => { it('supports combined category and confidence masks', async () => {
const categoryMask = new Uint8ClampedArray([1]); const categoryMask = new Uint8Array([1]);
const confidenceMask1 = new Float32Array([0.0]); const confidenceMask1 = new Float32Array([0.0]);
const confidenceMask2 = new Float32Array([1.0]); const confidenceMask2 = new Float32Array([1.0]);
@ -252,19 +252,19 @@ describe('ImageSegmenter', () => {
// Invoke the image segmenter // Invoke the image segmenter
imageSegmenter.segment({} as HTMLImageElement, result => { imageSegmenter.segment({} as HTMLImageElement, result => {
expect(imageSegmenter.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); expect(imageSegmenter.fakeWasmModule._waitUntilIdle).toHaveBeenCalled();
expect(result.categoryMask).toBeInstanceOf(MPImage); expect(result.categoryMask).toBeInstanceOf(MPMask);
expect(result.categoryMask!.width).toEqual(1); expect(result.categoryMask!.width).toEqual(1);
expect(result.categoryMask!.height).toEqual(1); expect(result.categoryMask!.height).toEqual(1);
expect(result.confidenceMasks![0]).toBeInstanceOf(MPImage); expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask);
expect(result.confidenceMasks![1]).toBeInstanceOf(MPImage); expect(result.confidenceMasks![1]).toBeInstanceOf(MPMask);
resolve(); resolve();
}); });
}); });
}); });
it('invokes listener once masks are avaiblae', async () => { it('invokes listener once masks are available', async () => {
const categoryMask = new Uint8ClampedArray([1]); const categoryMask = new Uint8Array([1]);
const confidenceMask = new Float32Array([0.0]); const confidenceMask = new Float32Array([0.0]);
let listenerCalled = false; let listenerCalled = false;
@ -292,4 +292,21 @@ describe('ImageSegmenter', () => {
}); });
}); });
}); });
it('returns result', () => {
const confidenceMask = new Float32Array([0.0]);
// Pass the test data to our listener
imageSegmenter.fakeWasmModule._waitUntilIdle.and.callFake(() => {
imageSegmenter.confidenceMasksListener!(
[
{data: confidenceMask, width: 1, height: 1},
],
1337);
});
const result = imageSegmenter.segment({} as HTMLImageElement);
expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask);
result.confidenceMasks![0].close();
});
}); });

View File

@ -16,7 +16,8 @@
import {FilesetResolver as FilesetResolverImpl} from '../../../tasks/web/core/fileset_resolver'; import {FilesetResolver as FilesetResolverImpl} from '../../../tasks/web/core/fileset_resolver';
import {DrawingUtils as DrawingUtilsImpl} from '../../../tasks/web/vision/core/drawing_utils'; import {DrawingUtils as DrawingUtilsImpl} from '../../../tasks/web/vision/core/drawing_utils';
import {MPImage as MPImageImpl, MPImageType as MPImageTypeImpl} from '../../../tasks/web/vision/core/image'; import {MPImage as MPImageImpl} from '../../../tasks/web/vision/core/image';
import {MPMask as MPMaskImpl} from '../../../tasks/web/vision/core/mask';
import {FaceDetector as FaceDetectorImpl} from '../../../tasks/web/vision/face_detector/face_detector'; import {FaceDetector as FaceDetectorImpl} from '../../../tasks/web/vision/face_detector/face_detector';
import {FaceLandmarker as FaceLandmarkerImpl, FaceLandmarksConnections as FaceLandmarksConnectionsImpl} from '../../../tasks/web/vision/face_landmarker/face_landmarker'; import {FaceLandmarker as FaceLandmarkerImpl, FaceLandmarksConnections as FaceLandmarksConnectionsImpl} from '../../../tasks/web/vision/face_landmarker/face_landmarker';
import {FaceStylizer as FaceStylizerImpl} from '../../../tasks/web/vision/face_stylizer/face_stylizer'; import {FaceStylizer as FaceStylizerImpl} from '../../../tasks/web/vision/face_stylizer/face_stylizer';
@ -34,7 +35,7 @@ import {PoseLandmarker as PoseLandmarkerImpl} from '../../../tasks/web/vision/po
const DrawingUtils = DrawingUtilsImpl; const DrawingUtils = DrawingUtilsImpl;
const FilesetResolver = FilesetResolverImpl; const FilesetResolver = FilesetResolverImpl;
const MPImage = MPImageImpl; const MPImage = MPImageImpl;
const MPImageType = MPImageTypeImpl; const MPMask = MPMaskImpl;
const FaceDetector = FaceDetectorImpl; const FaceDetector = FaceDetectorImpl;
const FaceLandmarker = FaceLandmarkerImpl; const FaceLandmarker = FaceLandmarkerImpl;
const FaceLandmarksConnections = FaceLandmarksConnectionsImpl; const FaceLandmarksConnections = FaceLandmarksConnectionsImpl;
@ -52,7 +53,7 @@ export {
DrawingUtils, DrawingUtils,
FilesetResolver, FilesetResolver,
MPImage, MPImage,
MPImageType, MPMask,
FaceDetector, FaceDetector,
FaceLandmarker, FaceLandmarker,
FaceLandmarksConnections, FaceLandmarksConnections,

View File

@ -37,7 +37,7 @@ mediapipe_ts_declaration(
deps = [ deps = [
"//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core",
"//mediapipe/tasks/web/core:classifier_options", "//mediapipe/tasks/web/core:classifier_options",
"//mediapipe/tasks/web/vision/core:image", "//mediapipe/tasks/web/vision/core:mask",
"//mediapipe/tasks/web/vision/core:vision_task_options", "//mediapipe/tasks/web/vision/core:vision_task_options",
], ],
) )
@ -54,7 +54,7 @@ mediapipe_ts_library(
"//mediapipe/framework:calculator_jspb_proto", "//mediapipe/framework:calculator_jspb_proto",
"//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core",
"//mediapipe/tasks/web/core:task_runner_test_utils", "//mediapipe/tasks/web/core:task_runner_test_utils",
"//mediapipe/tasks/web/vision/core:image", "//mediapipe/tasks/web/vision/core:mask",
"//mediapipe/util:render_data_jspb_proto", "//mediapipe/util:render_data_jspb_proto",
"//mediapipe/web/graph_runner:graph_runner_image_lib_ts", "//mediapipe/web/graph_runner:graph_runner_image_lib_ts",
], ],

View File

@ -86,7 +86,7 @@ export class InteractiveSegmenter extends VisionTaskRunner {
private result: InteractiveSegmenterResult = {}; private result: InteractiveSegmenterResult = {};
private outputCategoryMask = DEFAULT_OUTPUT_CATEGORY_MASK; private outputCategoryMask = DEFAULT_OUTPUT_CATEGORY_MASK;
private outputConfidenceMasks = DEFAULT_OUTPUT_CONFIDENCE_MASKS; private outputConfidenceMasks = DEFAULT_OUTPUT_CONFIDENCE_MASKS;
private userCallback: InteractiveSegmenterCallback = () => {}; private userCallback?: InteractiveSegmenterCallback;
private readonly options: ImageSegmenterGraphOptionsProto; private readonly options: ImageSegmenterGraphOptionsProto;
private readonly segmenterOptions: SegmenterOptionsProto; private readonly segmenterOptions: SegmenterOptionsProto;
@ -186,14 +186,9 @@ export class InteractiveSegmenter extends VisionTaskRunner {
/** /**
* Performs interactive segmentation on the provided single image and invokes * Performs interactive segmentation on the provided single image and invokes
* the callback with the response. The `roi` parameter is used to represent a * the callback with the response. The method returns synchronously once the
* user's region of interest for segmentation. * callback returns. The `roi` parameter is used to represent a user's region
* * of interest for segmentation.
* If the output_type is `CATEGORY_MASK`, the callback is invoked with vector
* of images that represent per-category segmented image mask. If the
* output_type is `CONFIDENCE_MASK`, the callback is invoked with a vector of
* images that contains only one confidence image mask. The method returns
* synchronously once the callback returns.
* *
* @param image An image to process. * @param image An image to process.
* @param roi The region of interest for segmentation. * @param roi The region of interest for segmentation.
@ -206,8 +201,9 @@ export class InteractiveSegmenter extends VisionTaskRunner {
callback: InteractiveSegmenterCallback): void; callback: InteractiveSegmenterCallback): void;
/** /**
* Performs interactive segmentation on the provided single image and invokes * Performs interactive segmentation on the provided single image and invokes
* the callback with the response. The `roi` parameter is used to represent a * the callback with the response. The method returns synchronously once the
* user's region of interest for segmentation. * callback returns. The `roi` parameter is used to represent a user's region
* of interest for segmentation.
* *
* The 'image_processing_options' parameter can be used to specify the * The 'image_processing_options' parameter can be used to specify the
* rotation to apply to the image before performing segmentation, by setting * rotation to apply to the image before performing segmentation, by setting
@ -215,12 +211,6 @@ export class InteractiveSegmenter extends VisionTaskRunner {
* using the 'regionOfInterest' field is NOT supported and will result in an * using the 'regionOfInterest' field is NOT supported and will result in an
* error. * error.
* *
* If the output_type is `CATEGORY_MASK`, the callback is invoked with vector
* of images that represent per-category segmented image mask. If the
* output_type is `CONFIDENCE_MASK`, the callback is invoked with a vector of
* images that contains only one confidence image mask. The method returns
* synchronously once the callback returns.
*
* @param image An image to process. * @param image An image to process.
* @param roi The region of interest for segmentation. * @param roi The region of interest for segmentation.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how * @param imageProcessingOptions the `ImageProcessingOptions` specifying how
@ -233,23 +223,63 @@ export class InteractiveSegmenter extends VisionTaskRunner {
image: ImageSource, roi: RegionOfInterest, image: ImageSource, roi: RegionOfInterest,
imageProcessingOptions: ImageProcessingOptions, imageProcessingOptions: ImageProcessingOptions,
callback: InteractiveSegmenterCallback): void; callback: InteractiveSegmenterCallback): void;
/**
* Performs interactive segmentation on the provided video frame and returns
* the segmentation result. This method creates a copy of the resulting masks
* and should not be used in high-throughput applictions. The `roi` parameter
* is used to represent a user's region of interest for segmentation.
*
* @param image An image to process.
* @param roi The region of interest for segmentation.
* @return The segmentation result. The data is copied to avoid lifetime
* limits.
*/
segment(image: ImageSource, roi: RegionOfInterest):
InteractiveSegmenterResult;
/**
* Performs interactive segmentation on the provided video frame and returns
* the segmentation result. This method creates a copy of the resulting masks
* and should not be used in high-throughput applictions. The `roi` parameter
* is used to represent a user's region of interest for segmentation.
*
* The 'image_processing_options' parameter can be used to specify the
* rotation to apply to the image before performing segmentation, by setting
* its 'rotationDegrees' field. Note that specifying a region-of-interest
* using the 'regionOfInterest' field is NOT supported and will result in an
* error.
*
* @param image An image to process.
* @param roi The region of interest for segmentation.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how
* to process the input image before running inference.
* @return The segmentation result. The data is copied to avoid lifetime
* limits.
*/
segment( segment(
image: ImageSource, roi: RegionOfInterest, image: ImageSource, roi: RegionOfInterest,
imageProcessingOptionsOrCallback: ImageProcessingOptions| imageProcessingOptions: ImageProcessingOptions):
InteractiveSegmenterResult;
segment(
image: ImageSource, roi: RegionOfInterest,
imageProcessingOptionsOrCallback?: ImageProcessingOptions|
InteractiveSegmenterCallback, InteractiveSegmenterCallback,
callback?: InteractiveSegmenterCallback): void { callback?: InteractiveSegmenterCallback): InteractiveSegmenterResult|
void {
const imageProcessingOptions = const imageProcessingOptions =
typeof imageProcessingOptionsOrCallback !== 'function' ? typeof imageProcessingOptionsOrCallback !== 'function' ?
imageProcessingOptionsOrCallback : imageProcessingOptionsOrCallback :
{}; {};
this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ? this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ?
imageProcessingOptionsOrCallback : imageProcessingOptionsOrCallback :
callback!; callback;
this.reset(); this.reset();
this.processRenderData(roi, this.getSynctheticTimestamp()); this.processRenderData(roi, this.getSynctheticTimestamp());
this.processImageData(image, imageProcessingOptions); this.processImageData(image, imageProcessingOptions);
this.userCallback = () => {};
if (!this.userCallback) {
return this.result;
}
} }
private reset(): void { private reset(): void {
@ -265,8 +295,10 @@ export class InteractiveSegmenter extends VisionTaskRunner {
return; return;
} }
if (this.userCallback) {
this.userCallback(this.result); this.userCallback(this.result);
} }
}
/** Updates the MediaPipe graph configuration. */ /** Updates the MediaPipe graph configuration. */
protected override refreshGraph(): void { protected override refreshGraph(): void {
@ -295,8 +327,9 @@ export class InteractiveSegmenter extends VisionTaskRunner {
this.graphRunner.attachImageVectorListener( this.graphRunner.attachImageVectorListener(
CONFIDENCE_MASKS_STREAM, (masks, timestamp) => { CONFIDENCE_MASKS_STREAM, (masks, timestamp) => {
this.result.confidenceMasks = this.result.confidenceMasks = masks.map(
masks.map(wasmImage => this.convertToMPImage(wasmImage)); wasmImage => this.convertToMPMask(
wasmImage, /* shouldCopyData= */ !this.userCallback));
this.setLatestOutputTimestamp(timestamp); this.setLatestOutputTimestamp(timestamp);
this.maybeInvokeCallback(); this.maybeInvokeCallback();
}); });
@ -314,7 +347,8 @@ export class InteractiveSegmenter extends VisionTaskRunner {
this.graphRunner.attachImageListener( this.graphRunner.attachImageListener(
CATEGORY_MASK_STREAM, (mask, timestamp) => { CATEGORY_MASK_STREAM, (mask, timestamp) => {
this.result.categoryMask = this.convertToMPImage(mask); this.result.categoryMask = this.convertToMPMask(
mask, /* shouldCopyData= */ !this.userCallback);
this.setLatestOutputTimestamp(timestamp); this.setLatestOutputTimestamp(timestamp);
this.maybeInvokeCallback(); this.maybeInvokeCallback();
}); });
@ -338,16 +372,31 @@ export class InteractiveSegmenter extends VisionTaskRunner {
const renderData = new RenderDataProto(); const renderData = new RenderDataProto();
const renderAnnotation = new RenderAnnotationProto(); const renderAnnotation = new RenderAnnotationProto();
const color = new ColorProto(); const color = new ColorProto();
color.setR(255); color.setR(255);
renderAnnotation.setColor(color); renderAnnotation.setColor(color);
if (roi.keypoint && roi.scribble) {
throw new Error('Cannot provide both keypoint and scribble.');
} else if (roi.keypoint) {
const point = new RenderAnnotationProto.Point(); const point = new RenderAnnotationProto.Point();
point.setNormalized(true); point.setNormalized(true);
point.setX(roi.keypoint.x); point.setX(roi.keypoint.x);
point.setY(roi.keypoint.y); point.setY(roi.keypoint.y);
renderAnnotation.setPoint(point); renderAnnotation.setPoint(point);
} else if (roi.scribble) {
const scribble = new RenderAnnotationProto.Scribble();
for (const coord of roi.scribble) {
const point = new RenderAnnotationProto.Point();
point.setNormalized(true);
point.setX(coord.x);
point.setY(coord.y);
scribble.addPoint(point);
}
renderAnnotation.setScribble(scribble);
} else {
throw new Error('Must provide either a keypoint or a scribble.');
}
renderData.addRenderAnnotations(renderAnnotation); renderData.addRenderAnnotations(renderAnnotation);

View File

@ -14,7 +14,7 @@
* limitations under the License. * limitations under the License.
*/ */
import {MPImage} from '../../../../tasks/web/vision/core/image'; import {MPMask} from '../../../../tasks/web/vision/core/mask';
/** The output result of InteractiveSegmenter. */ /** The output result of InteractiveSegmenter. */
export declare interface InteractiveSegmenterResult { export declare interface InteractiveSegmenterResult {
@ -23,12 +23,12 @@ export declare interface InteractiveSegmenterResult {
* `MPImage`s where, for each mask, each pixel represents the prediction * `MPImage`s where, for each mask, each pixel represents the prediction
* confidence, usually in the [0, 1] range. * confidence, usually in the [0, 1] range.
*/ */
confidenceMasks?: MPImage[]; confidenceMasks?: MPMask[];
/** /**
* A category mask represented as a `Uint8ClampedArray` or * A category mask represented as a `Uint8ClampedArray` or
* `WebGLTexture`-backed `MPImage` where each pixel represents the class which * `WebGLTexture`-backed `MPImage` where each pixel represents the class which
* the pixel in the original image was predicted to belong to. * the pixel in the original image was predicted to belong to.
*/ */
categoryMask?: MPImage; categoryMask?: MPMask;
} }

View File

@ -19,17 +19,21 @@ import 'jasmine';
// Placeholder for internal dependency on encodeByteArray // Placeholder for internal dependency on encodeByteArray
import {CalculatorGraphConfig} from '../../../../framework/calculator_pb'; import {CalculatorGraphConfig} from '../../../../framework/calculator_pb';
import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph} from '../../../../tasks/web/core/task_runner_test_utils'; import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph} from '../../../../tasks/web/core/task_runner_test_utils';
import {MPImage} from '../../../../tasks/web/vision/core/image'; import {MPMask} from '../../../../tasks/web/vision/core/mask';
import {RenderData as RenderDataProto} from '../../../../util/render_data_pb'; import {RenderData as RenderDataProto} from '../../../../util/render_data_pb';
import {WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib'; import {WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib';
import {InteractiveSegmenter, RegionOfInterest} from './interactive_segmenter'; import {InteractiveSegmenter, RegionOfInterest} from './interactive_segmenter';
const ROI: RegionOfInterest = { const KEYPOINT: RegionOfInterest = {
keypoint: {x: 0.1, y: 0.2} keypoint: {x: 0.1, y: 0.2}
}; };
const SCRIBBLE: RegionOfInterest = {
scribble: [{x: 0.1, y: 0.2}, {x: 0.3, y: 0.4}]
};
class InteractiveSegmenterFake extends InteractiveSegmenter implements class InteractiveSegmenterFake extends InteractiveSegmenter implements
MediapipeTasksFake { MediapipeTasksFake {
calculatorName = calculatorName =
@ -134,26 +138,46 @@ describe('InteractiveSegmenter', () => {
it('doesn\'t support region of interest', () => { it('doesn\'t support region of interest', () => {
expect(() => { expect(() => {
interactiveSegmenter.segment( interactiveSegmenter.segment(
{} as HTMLImageElement, ROI, {} as HTMLImageElement, KEYPOINT,
{regionOfInterest: {left: 0, right: 0, top: 0, bottom: 0}}, () => {}); {regionOfInterest: {left: 0, right: 0, top: 0, bottom: 0}}, () => {});
}).toThrowError('This task doesn\'t support region-of-interest.'); }).toThrowError('This task doesn\'t support region-of-interest.');
}); });
it('sends region-of-interest', (done) => { it('sends region-of-interest with keypoint', (done) => {
interactiveSegmenter.fakeWasmModule._waitUntilIdle.and.callFake(() => { interactiveSegmenter.fakeWasmModule._waitUntilIdle.and.callFake(() => {
expect(interactiveSegmenter.lastRoi).toBeDefined(); expect(interactiveSegmenter.lastRoi).toBeDefined();
expect(interactiveSegmenter.lastRoi!.toObject().renderAnnotationsList![0]) expect(interactiveSegmenter.lastRoi!.toObject().renderAnnotationsList![0])
.toEqual(jasmine.objectContaining({ .toEqual(jasmine.objectContaining({
color: {r: 255, b: undefined, g: undefined}, color: {r: 255, b: undefined, g: undefined},
point: {x: 0.1, y: 0.2, normalized: true},
})); }));
done(); done();
}); });
interactiveSegmenter.segment({} as HTMLImageElement, ROI, () => {}); interactiveSegmenter.segment({} as HTMLImageElement, KEYPOINT, () => {});
});
it('sends region-of-interest with scribble', (done) => {
interactiveSegmenter.fakeWasmModule._waitUntilIdle.and.callFake(() => {
expect(interactiveSegmenter.lastRoi).toBeDefined();
expect(interactiveSegmenter.lastRoi!.toObject().renderAnnotationsList![0])
.toEqual(jasmine.objectContaining({
color: {r: 255, b: undefined, g: undefined},
scribble: {
pointList: [
{x: 0.1, y: 0.2, normalized: true},
{x: 0.3, y: 0.4, normalized: true}
]
},
}));
done();
});
interactiveSegmenter.segment({} as HTMLImageElement, SCRIBBLE, () => {});
}); });
it('supports category mask', async () => { it('supports category mask', async () => {
const mask = new Uint8ClampedArray([1, 2, 3, 4]); const mask = new Uint8Array([1, 2, 3, 4]);
await interactiveSegmenter.setOptions( await interactiveSegmenter.setOptions(
{outputCategoryMask: true, outputConfidenceMasks: false}); {outputCategoryMask: true, outputConfidenceMasks: false});
@ -168,10 +192,10 @@ describe('InteractiveSegmenter', () => {
// Invoke the image segmenter // Invoke the image segmenter
return new Promise<void>(resolve => { return new Promise<void>(resolve => {
interactiveSegmenter.segment({} as HTMLImageElement, ROI, result => { interactiveSegmenter.segment({} as HTMLImageElement, KEYPOINT, result => {
expect(interactiveSegmenter.fakeWasmModule._waitUntilIdle) expect(interactiveSegmenter.fakeWasmModule._waitUntilIdle)
.toHaveBeenCalled(); .toHaveBeenCalled();
expect(result.categoryMask).toBeInstanceOf(MPImage); expect(result.categoryMask).toBeInstanceOf(MPMask);
expect(result.categoryMask!.width).toEqual(2); expect(result.categoryMask!.width).toEqual(2);
expect(result.categoryMask!.height).toEqual(2); expect(result.categoryMask!.height).toEqual(2);
expect(result.confidenceMasks).not.toBeDefined(); expect(result.confidenceMasks).not.toBeDefined();
@ -199,23 +223,23 @@ describe('InteractiveSegmenter', () => {
}); });
return new Promise<void>(resolve => { return new Promise<void>(resolve => {
// Invoke the image segmenter // Invoke the image segmenter
interactiveSegmenter.segment({} as HTMLImageElement, ROI, result => { interactiveSegmenter.segment({} as HTMLImageElement, KEYPOINT, result => {
expect(interactiveSegmenter.fakeWasmModule._waitUntilIdle) expect(interactiveSegmenter.fakeWasmModule._waitUntilIdle)
.toHaveBeenCalled(); .toHaveBeenCalled();
expect(result.categoryMask).not.toBeDefined(); expect(result.categoryMask).not.toBeDefined();
expect(result.confidenceMasks![0]).toBeInstanceOf(MPImage); expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask);
expect(result.confidenceMasks![0].width).toEqual(2); expect(result.confidenceMasks![0].width).toEqual(2);
expect(result.confidenceMasks![0].height).toEqual(2); expect(result.confidenceMasks![0].height).toEqual(2);
expect(result.confidenceMasks![1]).toBeInstanceOf(MPImage); expect(result.confidenceMasks![1]).toBeInstanceOf(MPMask);
resolve(); resolve();
}); });
}); });
}); });
it('supports combined category and confidence masks', async () => { it('supports combined category and confidence masks', async () => {
const categoryMask = new Uint8ClampedArray([1]); const categoryMask = new Uint8Array([1]);
const confidenceMask1 = new Float32Array([0.0]); const confidenceMask1 = new Float32Array([0.0]);
const confidenceMask2 = new Float32Array([1.0]); const confidenceMask2 = new Float32Array([1.0]);
@ -239,22 +263,22 @@ describe('InteractiveSegmenter', () => {
return new Promise<void>(resolve => { return new Promise<void>(resolve => {
// Invoke the image segmenter // Invoke the image segmenter
interactiveSegmenter.segment( interactiveSegmenter.segment(
{} as HTMLImageElement, ROI, result => { {} as HTMLImageElement, KEYPOINT, result => {
expect(interactiveSegmenter.fakeWasmModule._waitUntilIdle) expect(interactiveSegmenter.fakeWasmModule._waitUntilIdle)
.toHaveBeenCalled(); .toHaveBeenCalled();
expect(result.categoryMask).toBeInstanceOf(MPImage); expect(result.categoryMask).toBeInstanceOf(MPMask);
expect(result.categoryMask!.width).toEqual(1); expect(result.categoryMask!.width).toEqual(1);
expect(result.categoryMask!.height).toEqual(1); expect(result.categoryMask!.height).toEqual(1);
expect(result.confidenceMasks![0]).toBeInstanceOf(MPImage); expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask);
expect(result.confidenceMasks![1]).toBeInstanceOf(MPImage); expect(result.confidenceMasks![1]).toBeInstanceOf(MPMask);
resolve(); resolve();
}); });
}); });
}); });
it('invokes listener once masks are avaiblae', async () => { it('invokes listener once masks are avaiblae', async () => {
const categoryMask = new Uint8ClampedArray([1]); const categoryMask = new Uint8Array([1]);
const confidenceMask = new Float32Array([0.0]); const confidenceMask = new Float32Array([0.0]);
let listenerCalled = false; let listenerCalled = false;
@ -276,10 +300,28 @@ describe('InteractiveSegmenter', () => {
}); });
return new Promise<void>(resolve => { return new Promise<void>(resolve => {
interactiveSegmenter.segment({} as HTMLImageElement, ROI, () => { interactiveSegmenter.segment({} as HTMLImageElement, KEYPOINT, () => {
listenerCalled = true; listenerCalled = true;
resolve(); resolve();
}); });
}); });
}); });
it('returns result', () => {
const confidenceMask = new Float32Array([0.0]);
// Pass the test data to our listener
interactiveSegmenter.fakeWasmModule._waitUntilIdle.and.callFake(() => {
interactiveSegmenter.confidenceMasksListener!(
[
{data: confidenceMask, width: 1, height: 1},
],
1337);
});
const result =
interactiveSegmenter.segment({} as HTMLImageElement, KEYPOINT);
expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask);
result.confidenceMasks![0].close();
});
}); });

View File

@ -45,7 +45,7 @@ mediapipe_ts_declaration(
"//mediapipe/tasks/web/components/containers:category", "//mediapipe/tasks/web/components/containers:category",
"//mediapipe/tasks/web/components/containers:landmark", "//mediapipe/tasks/web/components/containers:landmark",
"//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core",
"//mediapipe/tasks/web/vision/core:image", "//mediapipe/tasks/web/vision/core:mask",
"//mediapipe/tasks/web/vision/core:vision_task_options", "//mediapipe/tasks/web/vision/core:vision_task_options",
], ],
) )
@ -63,7 +63,7 @@ mediapipe_ts_library(
"//mediapipe/tasks/web/components/processors:landmark_result", "//mediapipe/tasks/web/components/processors:landmark_result",
"//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core",
"//mediapipe/tasks/web/core:task_runner_test_utils", "//mediapipe/tasks/web/core:task_runner_test_utils",
"//mediapipe/tasks/web/vision/core:image", "//mediapipe/tasks/web/vision/core:mask",
"//mediapipe/tasks/web/vision/core:vision_task_runner", "//mediapipe/tasks/web/vision/core:vision_task_runner",
], ],
) )

View File

@ -43,7 +43,6 @@ const IMAGE_STREAM = 'image_in';
const NORM_RECT_STREAM = 'norm_rect'; const NORM_RECT_STREAM = 'norm_rect';
const NORM_LANDMARKS_STREAM = 'normalized_landmarks'; const NORM_LANDMARKS_STREAM = 'normalized_landmarks';
const WORLD_LANDMARKS_STREAM = 'world_landmarks'; const WORLD_LANDMARKS_STREAM = 'world_landmarks';
const AUXILIARY_LANDMARKS_STREAM = 'auxiliary_landmarks';
const SEGMENTATION_MASK_STREAM = 'segmentation_masks'; const SEGMENTATION_MASK_STREAM = 'segmentation_masks';
const POSE_LANDMARKER_GRAPH = const POSE_LANDMARKER_GRAPH =
'mediapipe.tasks.vision.pose_landmarker.PoseLandmarkerGraph'; 'mediapipe.tasks.vision.pose_landmarker.PoseLandmarkerGraph';
@ -64,7 +63,7 @@ export type PoseLandmarkerCallback = (result: PoseLandmarkerResult) => void;
export class PoseLandmarker extends VisionTaskRunner { export class PoseLandmarker extends VisionTaskRunner {
private result: Partial<PoseLandmarkerResult> = {}; private result: Partial<PoseLandmarkerResult> = {};
private outputSegmentationMasks = false; private outputSegmentationMasks = false;
private userCallback: PoseLandmarkerCallback = () => {}; private userCallback?: PoseLandmarkerCallback;
private readonly options: PoseLandmarkerGraphOptions; private readonly options: PoseLandmarkerGraphOptions;
private readonly poseLandmarksDetectorGraphOptions: private readonly poseLandmarksDetectorGraphOptions:
PoseLandmarksDetectorGraphOptions; PoseLandmarksDetectorGraphOptions;
@ -200,21 +199,22 @@ export class PoseLandmarker extends VisionTaskRunner {
} }
/** /**
* Performs pose detection on the provided single image and waits * Performs pose detection on the provided single image and invokes the
* synchronously for the response. Only use this method when the * callback with the response. The method returns synchronously once the
* PoseLandmarker is created with running mode `image`. * callback returns. Only use this method when the PoseLandmarker is created
* with running mode `image`.
* *
* @param image An image to process. * @param image An image to process.
* @param callback The callback that is invoked with the result. The * @param callback The callback that is invoked with the result. The
* lifetime of the returned masks is only guaranteed for the duration of * lifetime of the returned masks is only guaranteed for the duration of
* the callback. * the callback.
* @return The detected pose landmarks.
*/ */
detect(image: ImageSource, callback: PoseLandmarkerCallback): void; detect(image: ImageSource, callback: PoseLandmarkerCallback): void;
/** /**
* Performs pose detection on the provided single image and waits * Performs pose detection on the provided single image and invokes the
* synchronously for the response. Only use this method when the * callback with the response. The method returns synchronously once the
* PoseLandmarker is created with running mode `image`. * callback returns. Only use this method when the PoseLandmarker is created
* with running mode `image`.
* *
* @param image An image to process. * @param image An image to process.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how * @param imageProcessingOptions the `ImageProcessingOptions` specifying how
@ -222,16 +222,42 @@ export class PoseLandmarker extends VisionTaskRunner {
* @param callback The callback that is invoked with the result. The * @param callback The callback that is invoked with the result. The
* lifetime of the returned masks is only guaranteed for the duration of * lifetime of the returned masks is only guaranteed for the duration of
* the callback. * the callback.
* @return The detected pose landmarks.
*/ */
detect( detect(
image: ImageSource, imageProcessingOptions: ImageProcessingOptions, image: ImageSource, imageProcessingOptions: ImageProcessingOptions,
callback: PoseLandmarkerCallback): void; callback: PoseLandmarkerCallback): void;
/**
* Performs pose detection on the provided single image and waits
* synchronously for the response. This method creates a copy of the resulting
* masks and should not be used in high-throughput applictions. Only
* use this method when the PoseLandmarker is created with running mode
* `image`.
*
* @param image An image to process.
* @return The landmarker result. Any masks are copied to avoid lifetime
* limits.
* @return The detected pose landmarks.
*/
detect(image: ImageSource): PoseLandmarkerResult;
/**
* Performs pose detection on the provided single image and waits
* synchronously for the response. This method creates a copy of the resulting
* masks and should not be used in high-throughput applictions. Only
* use this method when the PoseLandmarker is created with running mode
* `image`.
*
* @param image An image to process.
* @return The landmarker result. Any masks are copied to avoid lifetime
* limits.
* @return The detected pose landmarks.
*/
detect(image: ImageSource, imageProcessingOptions: ImageProcessingOptions):
PoseLandmarkerResult;
detect( detect(
image: ImageSource, image: ImageSource,
imageProcessingOptionsOrCallback: ImageProcessingOptions| imageProcessingOptionsOrCallback?: ImageProcessingOptions|
PoseLandmarkerCallback, PoseLandmarkerCallback,
callback?: PoseLandmarkerCallback): void { callback?: PoseLandmarkerCallback): PoseLandmarkerResult|void {
const imageProcessingOptions = const imageProcessingOptions =
typeof imageProcessingOptionsOrCallback !== 'function' ? typeof imageProcessingOptionsOrCallback !== 'function' ?
imageProcessingOptionsOrCallback : imageProcessingOptionsOrCallback :
@ -242,59 +268,94 @@ export class PoseLandmarker extends VisionTaskRunner {
this.resetResults(); this.resetResults();
this.processImageData(image, imageProcessingOptions); this.processImageData(image, imageProcessingOptions);
this.userCallback = () => {};
if (!this.userCallback) {
return this.result as PoseLandmarkerResult;
}
} }
/** /**
* Performs pose detection on the provided video frame and waits * Performs pose detection on the provided video frame and invokes the
* synchronously for the response. Only use this method when the * callback with the response. The method returns synchronously once the
* PoseLandmarker is created with running mode `video`. * callback returns. Only use this method when the PoseLandmarker is created
* with running mode `video`.
* *
* @param videoFrame A video frame to process. * @param videoFrame A video frame to process.
* @param timestamp The timestamp of the current frame, in ms. * @param timestamp The timestamp of the current frame, in ms.
* @param callback The callback that is invoked with the result. The * @param callback The callback that is invoked with the result. The
* lifetime of the returned masks is only guaranteed for the duration of * lifetime of the returned masks is only guaranteed for the duration of
* the callback. * the callback.
* @return The detected pose landmarks.
*/ */
detectForVideo( detectForVideo(
videoFrame: ImageSource, timestamp: number, videoFrame: ImageSource, timestamp: number,
callback: PoseLandmarkerCallback): void; callback: PoseLandmarkerCallback): void;
/** /**
* Performs pose detection on the provided video frame and waits * Performs pose detection on the provided video frame and invokes the
* synchronously for the response. Only use this method when the * callback with the response. The method returns synchronously once the
* PoseLandmarker is created with running mode `video`. * callback returns. Only use this method when the PoseLandmarker is created
* with running mode `video`.
* *
* @param videoFrame A video frame to process. * @param videoFrame A video frame to process.
* @param timestamp The timestamp of the current frame, in ms.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how * @param imageProcessingOptions the `ImageProcessingOptions` specifying how
* to process the input image before running inference. * to process the input image before running inference.
* @param timestamp The timestamp of the current frame, in ms.
* @param callback The callback that is invoked with the result. The * @param callback The callback that is invoked with the result. The
* lifetime of the returned masks is only guaranteed for the duration of * lifetime of the returned masks is only guaranteed for the duration of
* the callback. * the callback.
* @return The detected pose landmarks.
*/ */
detectForVideo( detectForVideo(
videoFrame: ImageSource, imageProcessingOptions: ImageProcessingOptions, videoFrame: ImageSource, timestamp: number,
timestamp: number, callback: PoseLandmarkerCallback): void; imageProcessingOptions: ImageProcessingOptions,
callback: PoseLandmarkerCallback): void;
/**
* Performs pose detection on the provided video frame and returns the result.
* This method creates a copy of the resulting masks and should not be used
* in high-throughput applictions. Only use this method when the
* PoseLandmarker is created with running mode `video`.
*
* @param videoFrame A video frame to process.
* @param timestamp The timestamp of the current frame, in ms.
* @return The landmarker result. Any masks are copied to extend the
* lifetime of the returned data.
*/
detectForVideo(videoFrame: ImageSource, timestamp: number):
PoseLandmarkerResult;
/**
* Performs pose detection on the provided video frame and returns the result.
* This method creates a copy of the resulting masks and should not be used
* in high-throughput applictions. The method returns synchronously once the
* callback returns. Only use this method when the PoseLandmarker is created
* with running mode `video`.
*
* @param videoFrame A video frame to process.
* @param timestamp The timestamp of the current frame, in ms.
* @param imageProcessingOptions the `ImageProcessingOptions` specifying how
* to process the input image before running inference.
* @return The landmarker result. Any masks are copied to extend the lifetime
* of the returned data.
*/
detectForVideo( detectForVideo(
videoFrame: ImageSource, videoFrame: ImageSource, timestamp: number,
timestampOrImageProcessingOptions: number|ImageProcessingOptions, imageProcessingOptions: ImageProcessingOptions): PoseLandmarkerResult;
timestampOrCallback: number|PoseLandmarkerCallback, detectForVideo(
callback?: PoseLandmarkerCallback): void { videoFrame: ImageSource, timestamp: number,
imageProcessingOptionsOrCallback?: ImageProcessingOptions|
PoseLandmarkerCallback,
callback?: PoseLandmarkerCallback): PoseLandmarkerResult|void {
const imageProcessingOptions = const imageProcessingOptions =
typeof timestampOrImageProcessingOptions !== 'number' ? typeof imageProcessingOptionsOrCallback !== 'function' ?
timestampOrImageProcessingOptions : imageProcessingOptionsOrCallback :
{}; {};
const timestamp = typeof timestampOrImageProcessingOptions === 'number' ? this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ?
timestampOrImageProcessingOptions : imageProcessingOptionsOrCallback :
timestampOrCallback as number; callback;
this.userCallback = typeof timestampOrCallback === 'function' ?
timestampOrCallback :
callback!;
this.resetResults(); this.resetResults();
this.processVideoData(videoFrame, imageProcessingOptions, timestamp); this.processVideoData(videoFrame, imageProcessingOptions, timestamp);
this.userCallback = () => {};
if (!this.userCallback) {
return this.result as PoseLandmarkerResult;
}
} }
private resetResults(): void { private resetResults(): void {
@ -309,14 +370,14 @@ export class PoseLandmarker extends VisionTaskRunner {
if (!('worldLandmarks' in this.result)) { if (!('worldLandmarks' in this.result)) {
return; return;
} }
if (!('landmarks' in this.result)) {
return;
}
if (this.outputSegmentationMasks && !('segmentationMasks' in this.result)) { if (this.outputSegmentationMasks && !('segmentationMasks' in this.result)) {
return; return;
} }
if (this.userCallback) {
this.userCallback(this.result as Required<PoseLandmarkerResult>); this.userCallback(this.result as Required<PoseLandmarkerResult>);
} }
}
/** Sets the default values for the graph. */ /** Sets the default values for the graph. */
private initDefaults(): void { private initDefaults(): void {
@ -332,10 +393,11 @@ export class PoseLandmarker extends VisionTaskRunner {
* Converts raw data into a landmark, and adds it to our landmarks list. * Converts raw data into a landmark, and adds it to our landmarks list.
*/ */
private addJsLandmarks(data: Uint8Array[]): void { private addJsLandmarks(data: Uint8Array[]): void {
this.result.landmarks = [];
for (const binaryProto of data) { for (const binaryProto of data) {
const poseLandmarksProto = const poseLandmarksProto =
NormalizedLandmarkList.deserializeBinary(binaryProto); NormalizedLandmarkList.deserializeBinary(binaryProto);
this.result.landmarks = convertToLandmarks(poseLandmarksProto); this.result.landmarks.push(convertToLandmarks(poseLandmarksProto));
} }
} }
@ -344,24 +406,12 @@ export class PoseLandmarker extends VisionTaskRunner {
* worldLandmarks list. * worldLandmarks list.
*/ */
private adddJsWorldLandmarks(data: Uint8Array[]): void { private adddJsWorldLandmarks(data: Uint8Array[]): void {
this.result.worldLandmarks = [];
for (const binaryProto of data) { for (const binaryProto of data) {
const poseWorldLandmarksProto = const poseWorldLandmarksProto =
LandmarkList.deserializeBinary(binaryProto); LandmarkList.deserializeBinary(binaryProto);
this.result.worldLandmarks = this.result.worldLandmarks.push(
convertToWorldLandmarks(poseWorldLandmarksProto); convertToWorldLandmarks(poseWorldLandmarksProto));
}
}
/**
* Converts raw data into a landmark, and adds it to our auxilary
* landmarks list.
*/
private addJsAuxiliaryLandmarks(data: Uint8Array[]): void {
for (const binaryProto of data) {
const auxiliaryLandmarksProto =
NormalizedLandmarkList.deserializeBinary(binaryProto);
this.result.auxilaryLandmarks =
convertToLandmarks(auxiliaryLandmarksProto);
} }
} }
@ -372,7 +422,6 @@ export class PoseLandmarker extends VisionTaskRunner {
graphConfig.addInputStream(NORM_RECT_STREAM); graphConfig.addInputStream(NORM_RECT_STREAM);
graphConfig.addOutputStream(NORM_LANDMARKS_STREAM); graphConfig.addOutputStream(NORM_LANDMARKS_STREAM);
graphConfig.addOutputStream(WORLD_LANDMARKS_STREAM); graphConfig.addOutputStream(WORLD_LANDMARKS_STREAM);
graphConfig.addOutputStream(AUXILIARY_LANDMARKS_STREAM);
graphConfig.addOutputStream(SEGMENTATION_MASK_STREAM); graphConfig.addOutputStream(SEGMENTATION_MASK_STREAM);
const calculatorOptions = new CalculatorOptions(); const calculatorOptions = new CalculatorOptions();
@ -385,8 +434,6 @@ export class PoseLandmarker extends VisionTaskRunner {
landmarkerNode.addInputStream('NORM_RECT:' + NORM_RECT_STREAM); landmarkerNode.addInputStream('NORM_RECT:' + NORM_RECT_STREAM);
landmarkerNode.addOutputStream('NORM_LANDMARKS:' + NORM_LANDMARKS_STREAM); landmarkerNode.addOutputStream('NORM_LANDMARKS:' + NORM_LANDMARKS_STREAM);
landmarkerNode.addOutputStream('WORLD_LANDMARKS:' + WORLD_LANDMARKS_STREAM); landmarkerNode.addOutputStream('WORLD_LANDMARKS:' + WORLD_LANDMARKS_STREAM);
landmarkerNode.addOutputStream(
'AUXILIARY_LANDMARKS:' + AUXILIARY_LANDMARKS_STREAM);
landmarkerNode.setOptions(calculatorOptions); landmarkerNode.setOptions(calculatorOptions);
graphConfig.addNode(landmarkerNode); graphConfig.addNode(landmarkerNode);
@ -417,26 +464,14 @@ export class PoseLandmarker extends VisionTaskRunner {
this.maybeInvokeCallback(); this.maybeInvokeCallback();
}); });
this.graphRunner.attachProtoVectorListener(
AUXILIARY_LANDMARKS_STREAM, (binaryProto, timestamp) => {
this.addJsAuxiliaryLandmarks(binaryProto);
this.setLatestOutputTimestamp(timestamp);
this.maybeInvokeCallback();
});
this.graphRunner.attachEmptyPacketListener(
AUXILIARY_LANDMARKS_STREAM, timestamp => {
this.result.auxilaryLandmarks = [];
this.setLatestOutputTimestamp(timestamp);
this.maybeInvokeCallback();
});
if (this.outputSegmentationMasks) { if (this.outputSegmentationMasks) {
landmarkerNode.addOutputStream( landmarkerNode.addOutputStream(
'SEGMENTATION_MASK:' + SEGMENTATION_MASK_STREAM); 'SEGMENTATION_MASK:' + SEGMENTATION_MASK_STREAM);
this.graphRunner.attachImageVectorListener( this.graphRunner.attachImageVectorListener(
SEGMENTATION_MASK_STREAM, (masks, timestamp) => { SEGMENTATION_MASK_STREAM, (masks, timestamp) => {
this.result.segmentationMasks = this.result.segmentationMasks = masks.map(
masks.map(wasmImage => this.convertToMPImage(wasmImage)); wasmImage => this.convertToMPMask(
wasmImage, /* shouldCopyData= */ !this.userCallback));
this.setLatestOutputTimestamp(timestamp); this.setLatestOutputTimestamp(timestamp);
this.maybeInvokeCallback(); this.maybeInvokeCallback();
}); });

View File

@ -16,7 +16,7 @@
import {Category} from '../../../../tasks/web/components/containers/category'; import {Category} from '../../../../tasks/web/components/containers/category';
import {Landmark, NormalizedLandmark} from '../../../../tasks/web/components/containers/landmark'; import {Landmark, NormalizedLandmark} from '../../../../tasks/web/components/containers/landmark';
import {MPImage} from '../../../../tasks/web/vision/core/image'; import {MPMask} from '../../../../tasks/web/vision/core/mask';
export {Category, Landmark, NormalizedLandmark}; export {Category, Landmark, NormalizedLandmark};
@ -26,14 +26,11 @@ export {Category, Landmark, NormalizedLandmark};
*/ */
export declare interface PoseLandmarkerResult { export declare interface PoseLandmarkerResult {
/** Pose landmarks of detected poses. */ /** Pose landmarks of detected poses. */
landmarks: NormalizedLandmark[]; landmarks: NormalizedLandmark[][];
/** Pose landmarks in world coordinates of detected poses. */ /** Pose landmarks in world coordinates of detected poses. */
worldLandmarks: Landmark[]; worldLandmarks: Landmark[][];
/** Detected auxiliary landmarks, used for deriving ROI for next frame. */
auxilaryLandmarks: NormalizedLandmark[];
/** Segmentation mask for the detected pose. */ /** Segmentation mask for the detected pose. */
segmentationMasks?: MPImage[]; segmentationMasks?: MPMask[];
} }

View File

@ -18,7 +18,7 @@ import 'jasmine';
import {CalculatorGraphConfig} from '../../../../framework/calculator_pb'; import {CalculatorGraphConfig} from '../../../../framework/calculator_pb';
import {createLandmarks, createWorldLandmarks} from '../../../../tasks/web/components/processors/landmark_result_test_lib'; import {createLandmarks, createWorldLandmarks} from '../../../../tasks/web/components/processors/landmark_result_test_lib';
import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph} from '../../../../tasks/web/core/task_runner_test_utils'; import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph} from '../../../../tasks/web/core/task_runner_test_utils';
import {MPImage} from '../../../../tasks/web/vision/core/image'; import {MPMask} from '../../../../tasks/web/vision/core/mask';
import {VisionGraphRunner} from '../../../../tasks/web/vision/core/vision_task_runner'; import {VisionGraphRunner} from '../../../../tasks/web/vision/core/vision_task_runner';
import {PoseLandmarker} from './pose_landmarker'; import {PoseLandmarker} from './pose_landmarker';
@ -45,8 +45,7 @@ class PoseLandmarkerFake extends PoseLandmarker implements MediapipeTasksFake {
this.attachListenerSpies[0] = this.attachListenerSpies[0] =
spyOn(this.graphRunner, 'attachProtoVectorListener') spyOn(this.graphRunner, 'attachProtoVectorListener')
.and.callFake((stream, listener) => { .and.callFake((stream, listener) => {
expect(stream).toMatch( expect(stream).toMatch(/(normalized_landmarks|world_landmarks)/);
/(normalized_landmarks|world_landmarks|auxiliary_landmarks)/);
this.listeners.set(stream, listener as PacketListener); this.listeners.set(stream, listener as PacketListener);
}); });
this.attachListenerSpies[1] = this.attachListenerSpies[1] =
@ -80,23 +79,23 @@ describe('PoseLandmarker', () => {
it('initializes graph', async () => { it('initializes graph', async () => {
verifyGraph(poseLandmarker); verifyGraph(poseLandmarker);
expect(poseLandmarker.listeners).toHaveSize(3); expect(poseLandmarker.listeners).toHaveSize(2);
}); });
it('reloads graph when settings are changed', async () => { it('reloads graph when settings are changed', async () => {
await poseLandmarker.setOptions({numPoses: 1}); await poseLandmarker.setOptions({numPoses: 1});
verifyGraph(poseLandmarker, [['poseDetectorGraphOptions', 'numPoses'], 1]); verifyGraph(poseLandmarker, [['poseDetectorGraphOptions', 'numPoses'], 1]);
expect(poseLandmarker.listeners).toHaveSize(3); expect(poseLandmarker.listeners).toHaveSize(2);
await poseLandmarker.setOptions({numPoses: 5}); await poseLandmarker.setOptions({numPoses: 5});
verifyGraph(poseLandmarker, [['poseDetectorGraphOptions', 'numPoses'], 5]); verifyGraph(poseLandmarker, [['poseDetectorGraphOptions', 'numPoses'], 5]);
expect(poseLandmarker.listeners).toHaveSize(3); expect(poseLandmarker.listeners).toHaveSize(2);
}); });
it('registers listener for segmentation masks', async () => { it('registers listener for segmentation masks', async () => {
expect(poseLandmarker.listeners).toHaveSize(3); expect(poseLandmarker.listeners).toHaveSize(2);
await poseLandmarker.setOptions({outputSegmentationMasks: true}); await poseLandmarker.setOptions({outputSegmentationMasks: true});
expect(poseLandmarker.listeners).toHaveSize(4); expect(poseLandmarker.listeners).toHaveSize(3);
}); });
it('merges options', async () => { it('merges options', async () => {
@ -209,8 +208,6 @@ describe('PoseLandmarker', () => {
(landmarksProto, 1337); (landmarksProto, 1337);
poseLandmarker.listeners.get('world_landmarks')! poseLandmarker.listeners.get('world_landmarks')!
(worldLandmarksProto, 1337); (worldLandmarksProto, 1337);
poseLandmarker.listeners.get('auxiliary_landmarks')!
(landmarksProto, 1337);
poseLandmarker.listeners.get('segmentation_masks')!(masks, 1337); poseLandmarker.listeners.get('segmentation_masks')!(masks, 1337);
}); });
@ -222,10 +219,9 @@ describe('PoseLandmarker', () => {
.toHaveBeenCalledTimes(1); .toHaveBeenCalledTimes(1);
expect(poseLandmarker.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); expect(poseLandmarker.fakeWasmModule._waitUntilIdle).toHaveBeenCalled();
expect(result.landmarks).toEqual([{'x': 0, 'y': 0, 'z': 0}]); expect(result.landmarks).toEqual([[{'x': 0, 'y': 0, 'z': 0}]]);
expect(result.worldLandmarks).toEqual([{'x': 0, 'y': 0, 'z': 0}]); expect(result.worldLandmarks).toEqual([[{'x': 0, 'y': 0, 'z': 0}]]);
expect(result.auxilaryLandmarks).toEqual([{'x': 0, 'y': 0, 'z': 0}]); expect(result.segmentationMasks![0]).toBeInstanceOf(MPMask);
expect(result.segmentationMasks![0]).toBeInstanceOf(MPImage);
done(); done();
}); });
}); });
@ -240,8 +236,6 @@ describe('PoseLandmarker', () => {
(landmarksProto, 1337); (landmarksProto, 1337);
poseLandmarker.listeners.get('world_landmarks')! poseLandmarker.listeners.get('world_landmarks')!
(worldLandmarksProto, 1337); (worldLandmarksProto, 1337);
poseLandmarker.listeners.get('auxiliary_landmarks')!
(landmarksProto, 1337);
}); });
// Invoke the pose landmarker twice // Invoke the pose landmarker twice
@ -261,7 +255,39 @@ describe('PoseLandmarker', () => {
expect(landmarks1).toEqual(landmarks2); expect(landmarks1).toEqual(landmarks2);
}); });
it('invokes listener once masks are avaiblae', (done) => { it('supports multiple poses', (done) => {
const landmarksProto = [
createLandmarks(0.1, 0.2, 0.3).serializeBinary(),
createLandmarks(0.4, 0.5, 0.6).serializeBinary()
];
const worldLandmarksProto = [
createWorldLandmarks(1, 2, 3).serializeBinary(),
createWorldLandmarks(4, 5, 6).serializeBinary()
];
poseLandmarker.setOptions({numPoses: 1});
// Pass the test data to our listener
poseLandmarker.fakeWasmModule._waitUntilIdle.and.callFake(() => {
poseLandmarker.listeners.get('normalized_landmarks')!
(landmarksProto, 1337);
poseLandmarker.listeners.get('world_landmarks')!
(worldLandmarksProto, 1337);
});
// Invoke the pose landmarker
poseLandmarker.detect({} as HTMLImageElement, result => {
expect(result.landmarks).toEqual([
[{'x': 0.1, 'y': 0.2, 'z': 0.3}], [{'x': 0.4, 'y': 0.5, 'z': 0.6}]
]);
expect(result.worldLandmarks).toEqual([
[{'x': 1, 'y': 2, 'z': 3}], [{'x': 4, 'y': 5, 'z': 6}]
]);
done();
});
});
it('invokes listener once masks are available', (done) => {
const landmarksProto = [createLandmarks().serializeBinary()]; const landmarksProto = [createLandmarks().serializeBinary()];
const worldLandmarksProto = [createWorldLandmarks().serializeBinary()]; const worldLandmarksProto = [createWorldLandmarks().serializeBinary()];
const masks = [ const masks = [
@ -281,8 +307,6 @@ describe('PoseLandmarker', () => {
poseLandmarker.listeners.get('world_landmarks')! poseLandmarker.listeners.get('world_landmarks')!
(worldLandmarksProto, 1337); (worldLandmarksProto, 1337);
expect(listenerCalled).toBeFalse(); expect(listenerCalled).toBeFalse();
poseLandmarker.listeners.get('auxiliary_landmarks')!
(landmarksProto, 1337);
expect(listenerCalled).toBeFalse(); expect(listenerCalled).toBeFalse();
poseLandmarker.listeners.get('segmentation_masks')!(masks, 1337); poseLandmarker.listeners.get('segmentation_masks')!(masks, 1337);
expect(listenerCalled).toBeTrue(); expect(listenerCalled).toBeTrue();
@ -294,4 +318,23 @@ describe('PoseLandmarker', () => {
listenerCalled = true; listenerCalled = true;
}); });
}); });
it('returns result', () => {
const landmarksProto = [createLandmarks().serializeBinary()];
const worldLandmarksProto = [createWorldLandmarks().serializeBinary()];
// Pass the test data to our listener
poseLandmarker.fakeWasmModule._waitUntilIdle.and.callFake(() => {
poseLandmarker.listeners.get('normalized_landmarks')!
(landmarksProto, 1337);
poseLandmarker.listeners.get('world_landmarks')!
(worldLandmarksProto, 1337);
});
// Invoke the pose landmarker
const result = poseLandmarker.detect({} as HTMLImageElement);
expect(poseLandmarker.fakeWasmModule._waitUntilIdle).toHaveBeenCalled();
expect(result.landmarks).toEqual([[{'x': 0, 'y': 0, 'z': 0}]]);
expect(result.worldLandmarks).toEqual([[{'x': 0, 'y': 0, 'z': 0}]]);
});
}); });

View File

@ -16,7 +16,8 @@
export * from '../../../tasks/web/core/fileset_resolver'; export * from '../../../tasks/web/core/fileset_resolver';
export * from '../../../tasks/web/vision/core/drawing_utils'; export * from '../../../tasks/web/vision/core/drawing_utils';
export {MPImage, MPImageChannelConverter, MPImageType} from '../../../tasks/web/vision/core/image'; export {MPImage} from '../../../tasks/web/vision/core/image';
export {MPMask} from '../../../tasks/web/vision/core/mask';
export * from '../../../tasks/web/vision/face_detector/face_detector'; export * from '../../../tasks/web/vision/face_detector/face_detector';
export * from '../../../tasks/web/vision/face_landmarker/face_landmarker'; export * from '../../../tasks/web/vision/face_landmarker/face_landmarker';
export * from '../../../tasks/web/vision/face_stylizer/face_stylizer'; export * from '../../../tasks/web/vision/face_stylizer/face_stylizer';

View File

@ -10,7 +10,7 @@ type LibConstructor = new (...args: any[]) => GraphRunner;
/** An image returned from a MediaPipe graph. */ /** An image returned from a MediaPipe graph. */
export interface WasmImage { export interface WasmImage {
data: Uint8ClampedArray|Float32Array|WebGLTexture; data: Uint8Array|Float32Array|WebGLTexture;
width: number; width: number;
height: number; height: number;
} }

62
third_party/BUILD vendored
View File

@ -13,6 +13,9 @@
# limitations under the License. # limitations under the License.
# #
load("@rules_foreign_cc//tools/build_defs:cmake.bzl", "cmake_external")
load("@bazel_skylib//:bzl_library.bzl", "bzl_library")
licenses(["notice"]) # Apache License 2.0 licenses(["notice"]) # Apache License 2.0
exports_files(["LICENSE"]) exports_files(["LICENSE"])
@ -61,16 +64,73 @@ config_setting(
visibility = ["//visibility:public"], visibility = ["//visibility:public"],
) )
config_setting(
name = "opencv_ios_arm64_source_build",
define_values = {
"OPENCV": "source",
},
values = {
"apple_platform_type": "ios",
"cpu": "ios_arm64",
},
)
config_setting(
name = "opencv_ios_sim_arm64_source_build",
define_values = {
"OPENCV": "source",
},
values = {
"apple_platform_type": "ios",
"cpu": "ios_sim_arm64",
},
)
config_setting(
name = "opencv_ios_x86_64_source_build",
define_values = {
"OPENCV": "source",
},
values = {
"apple_platform_type": "ios",
"cpu": "ios_x86_64",
},
)
config_setting(
name = "opencv_ios_sim_fat_source_build",
define_values = {
"OPENCV": "source",
},
values = {
"apple_platform_type": "ios",
"ios_multi_cpus": "sim_arm64, x86_64",
},
)
alias( alias(
name = "opencv", name = "opencv",
actual = select({ actual = select({
":opencv_source_build": ":opencv_cmake", ":opencv_source_build": ":opencv_cmake",
":opencv_ios_sim_arm64_source_build": "@ios_opencv_source//:opencv",
":opencv_ios_sim_fat_source_build": "@ios_opencv_source//:opencv",
":opencv_ios_arm64_source_build": "@ios_opencv_source//:opencv",
"//conditions:default": ":opencv_binary", "//conditions:default": ":opencv_binary",
}), }),
visibility = ["//visibility:public"], visibility = ["//visibility:public"],
) )
load("@rules_foreign_cc//tools/build_defs:cmake.bzl", "cmake_external") bzl_library(
name = "opencv_ios_xcframework_files_bzl",
srcs = ["opencv_ios_xcframework_files.bzl"],
visibility = ["//visibility:private"],
)
bzl_library(
name = "opencv_ios_source_bzl",
srcs = ["opencv_ios_source.bzl"],
visibility = ["//visibility:private"],
)
# Note: this determines the order in which the libraries are passed to the # Note: this determines the order in which the libraries are passed to the
# linker, so if library A depends on library B, library B must come _after_. # linker, so if library A depends on library B, library B must come _after_.

View File

@ -204,8 +204,8 @@ def external_files():
http_file( http_file(
name = "com_google_mediapipe_conv2d_input_channel_1_tflite", name = "com_google_mediapipe_conv2d_input_channel_1_tflite",
sha256 = "126edac445967799f3b8b124d15483b1506f6d6cb57a501c1636eb8f2fb3734f", sha256 = "ccb667092f3aed3a35a57fb3478fecc0c8f6360dbf477a9db9c24e5b3ec4273e",
urls = ["https://storage.googleapis.com/mediapipe-assets/conv2d_input_channel_1.tflite?generation=1678218348519744"], urls = ["https://storage.googleapis.com/mediapipe-assets/conv2d_input_channel_1.tflite?generation=1683252905577703"],
) )
http_file( http_file(
@ -246,8 +246,8 @@ def external_files():
http_file( http_file(
name = "com_google_mediapipe_dense_tflite", name = "com_google_mediapipe_dense_tflite",
sha256 = "be9323068461b1cbf412692ee916be30dcb1a5fb59a9ee875d470bc340d9e869", sha256 = "6795e7c3a263f44e97be048a5e1166e0921b453bfbaf037f4f69ac5c059ee945",
urls = ["https://storage.googleapis.com/mediapipe-assets/dense.tflite?generation=1678218351373709"], urls = ["https://storage.googleapis.com/mediapipe-assets/dense.tflite?generation=1683252907920466"],
) )
http_file( http_file(
@ -960,8 +960,8 @@ def external_files():
http_file( http_file(
name = "com_google_mediapipe_portrait_selfie_segmentation_expected_category_mask_jpg", name = "com_google_mediapipe_portrait_selfie_segmentation_expected_category_mask_jpg",
sha256 = "d8f20fa746e14067f668dd293f21bbc50ec81196d186386a6ded1278c3ec8f46", sha256 = "1400c6fccf3805bfd1644d7ed9be98dfa4f900e1720838c566963f8d9f10f5d0",
urls = ["https://storage.googleapis.com/mediapipe-assets/portrait_selfie_segmentation_expected_category_mask.jpg?generation=1678606935088873"], urls = ["https://storage.googleapis.com/mediapipe-assets/portrait_selfie_segmentation_expected_category_mask.jpg?generation=1683332555306471"],
) )
http_file( http_file(
@ -972,8 +972,8 @@ def external_files():
http_file( http_file(
name = "com_google_mediapipe_portrait_selfie_segmentation_landscape_expected_category_mask_jpg", name = "com_google_mediapipe_portrait_selfie_segmentation_landscape_expected_category_mask_jpg",
sha256 = "f5c3fa3d93f8e7289b69b8a89c2519276dfa5014dcc50ed6e86e8cd4d4ae7f27", sha256 = "a208aeeeb615fd40046d883e2c7982458e1b12edd6526e88c305c4053b0a9399",
urls = ["https://storage.googleapis.com/mediapipe-assets/portrait_selfie_segmentation_landscape_expected_category_mask.jpg?generation=1678606939469429"], urls = ["https://storage.googleapis.com/mediapipe-assets/portrait_selfie_segmentation_landscape_expected_category_mask.jpg?generation=1683332557473435"],
) )
http_file( http_file(
@ -1158,14 +1158,14 @@ def external_files():
http_file( http_file(
name = "com_google_mediapipe_selfie_segmentation_landscape_tflite", name = "com_google_mediapipe_selfie_segmentation_landscape_tflite",
sha256 = "28fb4c287d6295a2dba6c1f43b43315a37f927ddcd6693d635d625d176eef162", sha256 = "a77d03f4659b9f6b6c1f5106947bf40e99d7655094b6527f214ea7d451106edd",
urls = ["https://storage.googleapis.com/mediapipe-assets/selfie_segmentation_landscape.tflite?generation=1678775102234495"], urls = ["https://storage.googleapis.com/mediapipe-assets/selfie_segmentation_landscape.tflite?generation=1683332561312022"],
) )
http_file( http_file(
name = "com_google_mediapipe_selfie_segmentation_tflite", name = "com_google_mediapipe_selfie_segmentation_tflite",
sha256 = "b0e2ec6f95107795b952b27f3d92806b45f0bc069dac76dcd264cd1b90d61c6c", sha256 = "9ee168ec7c8f2a16c56fe8e1cfbc514974cbbb7e434051b455635f1bd1462f5c",
urls = ["https://storage.googleapis.com/mediapipe-assets/selfie_segmentation.tflite?generation=1678775104900954"], urls = ["https://storage.googleapis.com/mediapipe-assets/selfie_segmentation.tflite?generation=1683332563830600"],
) )
http_file( http_file(

125
third_party/opencv_ios_source.BUILD vendored Normal file
View File

@ -0,0 +1,125 @@
# Description:
# OpenCV xcframework for video/image processing on iOS.
licenses(["notice"]) # BSD license
exports_files(["LICENSE"])
load(
"@build_bazel_rules_apple//apple:apple.bzl",
"apple_static_xcframework_import",
)
load(
"@//third_party:opencv_ios_source.bzl",
"select_headers",
"unzip_opencv_xcframework",
)
# Build opencv2.xcframework from source using a convenience script provided in
# OPENCV sources and zip the xcframework. We only build the modules required by MediaPipe by specifying
# the modules to be ignored as command line arguments.
# We also specify the simulator and device architectures we are building for.
# Currently we only support iOS arm64 (M1 Macs) and x86_64(Intel Macs) simulators
# and arm64 iOS devices.
# Bitcode and Swift support. Swift support will be added in when the final binary
# for MediaPipe iOS Task libraries are built. Shipping with OPENCV built with
# Swift support throws linker errors when the MediaPipe framework is used from
# an iOS project.
genrule(
name = "build_opencv_xcframework",
srcs = glob(["opencv-4.5.1/**"]),
outs = ["opencv2.xcframework.zip"],
cmd = "&&".join([
"$(location opencv-4.5.1/platforms/apple/build_xcframework.py) \
--iphonesimulator_archs arm64,x86_64 \
--iphoneos_archs arm64 \
--without dnn \
--without ml \
--without stitching \
--without photo \
--without objdetect \
--without gapi \
--without flann \
--disable PROTOBUF \
--disable-bitcode \
--disable-swift \
--build_only_specified_archs \
--out $(@D)",
"cd $(@D)",
"zip --symlinks -r opencv2.xcframework.zip opencv2.xcframework",
]),
)
# Unzips `opencv2.xcframework.zip` built from source by `build_opencv_xcframework`
# genrule and returns an exhaustive list of all its files including symlinks.
unzip_opencv_xcframework(
name = "opencv2_unzipped_xcframework_files",
zip_file = "opencv2.xcframework.zip",
)
# Imports the files of the unzipped `opencv2.xcframework` as an apple static
# framework which can be linked to iOS targets.
apple_static_xcframework_import(
name = "opencv_xcframework",
visibility = ["//visibility:public"],
xcframework_imports = [":opencv2_unzipped_xcframework_files"],
)
# Filters the headers for each platform in `opencv2.xcframework` which will be
# used as headers in a `cc_library` that can be linked to C++ targets.
select_headers(
name = "opencv_xcframework_device_headers",
srcs = [":opencv_xcframework"],
platform = "ios-arm64",
)
select_headers(
name = "opencv_xcframework_simulator_headers",
srcs = [":opencv_xcframework"],
platform = "ios-arm64_x86_64-simulator",
)
# `cc_library` that can be linked to C++ targets to import opencv headers.
cc_library(
name = "opencv",
hdrs = select({
"@//mediapipe:ios_x86_64": [
":opencv_xcframework_simulator_headers",
],
"@//mediapipe:ios_sim_arm64": [
":opencv_xcframework_simulator_headers",
],
"@//mediapipe:ios_arm64": [
":opencv_xcframework_simulator_headers",
],
# A value from above is chosen arbitarily.
"//conditions:default": [
":opencv_xcframework_simulator_headers",
],
}),
copts = [
"-std=c++11",
"-x objective-c++",
],
include_prefix = "opencv2",
linkopts = [
"-framework AssetsLibrary",
"-framework CoreFoundation",
"-framework CoreGraphics",
"-framework CoreMedia",
"-framework Accelerate",
"-framework CoreImage",
"-framework AVFoundation",
"-framework CoreVideo",
"-framework QuartzCore",
],
strip_include_prefix = select({
"@//mediapipe:ios_x86_64": "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers",
"@//mediapipe:ios_sim_arm64": "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers",
"@//mediapipe:ios_arm64": "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers",
# Random value is selected for default cases.
"//conditions:default": "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers",
}),
visibility = ["//visibility:public"],
deps = [":opencv_xcframework"],
)

158
third_party/opencv_ios_source.bzl vendored Normal file
View File

@ -0,0 +1,158 @@
# Copyright 2023 The MediaPipe Authors. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Custom rules for building iOS OpenCV xcframework from sources."""
load(
"@//third_party:opencv_ios_xcframework_files.bzl",
"OPENCV_XCFRAMEWORK_INFO_PLIST_PATH",
"OPENCV_XCFRAMEWORK_IOS_DEVICE_FILE_PATHS",
"OPENCV_XCFRAMEWORK_IOS_SIMULATOR_FILE_PATHS",
)
_OPENCV_XCFRAMEWORK_DIR_NAME = "opencv2.xcframework"
_OPENCV_FRAMEWORK_DIR_NAME = "opencv2.framework"
_OPENCV_SIMULATOR_PLATFORM_DIR_NAME = "ios-arm64_x86_64-simulator"
_OPENCV_DEVICE_PLATFORM_DIR_NAME = "ios-arm64"
def _select_headers_impl(ctx):
_files = [
f
for f in ctx.files.srcs
if (f.basename.endswith(".h") or f.basename.endswith(".hpp")) and
f.dirname.find(ctx.attr.platform) != -1
]
return [DefaultInfo(files = depset(_files))]
# This rule selects only the headers from an apple static xcframework filtered by
# an input platform string.
select_headers = rule(
implementation = _select_headers_impl,
attrs = {
"srcs": attr.label_list(mandatory = True, allow_files = True),
"platform": attr.string(mandatory = True),
},
)
# This function declares and returns symlinks to the directories within each platform
# in `opencv2.xcframework` expected to be present.
# The symlinks are created according to the structure stipulated by apple xcframeworks
# do that they can be correctly consumed by `apple_static_xcframework_import` rule.
def _opencv2_directory_symlinks(ctx, platforms):
basenames = ["Resources", "Headers", "Modules", "Versions/Current"]
symlinks = []
for platform in platforms:
symlinks = symlinks + [
ctx.actions.declare_symlink(
_OPENCV_XCFRAMEWORK_DIR_NAME + "/{}/{}/{}".format(platform, _OPENCV_FRAMEWORK_DIR_NAME, name),
)
for name in basenames
]
return symlinks
# This function declares and returns all the files for each platform expected
# to be present in `opencv2.xcframework` after the unzipping action is run.
def _opencv2_file_list(ctx, platform_filepath_lists):
binary_name = "opencv2"
output_files = []
binaries_to_symlink = []
for (platform, filepaths) in platform_filepath_lists:
for path in filepaths:
file = ctx.actions.declare_file(path)
output_files.append(file)
if path.endswith(binary_name):
symlink_output = ctx.actions.declare_file(
_OPENCV_XCFRAMEWORK_DIR_NAME + "/{}/{}/{}".format(
platform,
_OPENCV_FRAMEWORK_DIR_NAME,
binary_name,
),
)
binaries_to_symlink.append((symlink_output, file))
return output_files, binaries_to_symlink
def _unzip_opencv_xcframework_impl(ctx):
# Array to iterate over the various platforms to declare output files and
# symlinks.
platform_filepath_lists = [
(_OPENCV_SIMULATOR_PLATFORM_DIR_NAME, OPENCV_XCFRAMEWORK_IOS_SIMULATOR_FILE_PATHS),
(_OPENCV_DEVICE_PLATFORM_DIR_NAME, OPENCV_XCFRAMEWORK_IOS_DEVICE_FILE_PATHS),
]
# Gets an exhaustive list of output files which are present in the xcframework.
# Also gets array of `(binary simlink, binary)` pairs which are to be symlinked
# using `ctx.actions.symlink()`.
output_files, binaries_to_symlink = _opencv2_file_list(ctx, platform_filepath_lists)
output_files.append(ctx.actions.declare_file(OPENCV_XCFRAMEWORK_INFO_PLIST_PATH))
# xcframeworks have a directory structure in which the `opencv2.framework` folders for each
# platform contain directories which are symlinked to the respective folders of the version
# in use. Simply unzipping the zip of the framework will not make Bazel treat these
# as symlinks. They have to be explicity declared as symlinks using `ctx.actions.declare_symlink()`.
directory_symlinks = _opencv2_directory_symlinks(
ctx,
[_OPENCV_SIMULATOR_PLATFORM_DIR_NAME, _OPENCV_DEVICE_PLATFORM_DIR_NAME],
)
output_files = output_files + directory_symlinks
args = ctx.actions.args()
# Add the path of the zip file to be unzipped as an argument to be passed to
# `run_shell` action.
args.add(ctx.file.zip_file.path)
# Add the path to the directory in which the framework is to be unzipped to.
args.add(ctx.file.zip_file.dirname)
ctx.actions.run_shell(
inputs = [ctx.file.zip_file],
outputs = output_files,
arguments = [args],
progress_message = "Unzipping %s" % ctx.file.zip_file.short_path,
command = "unzip -qq $1 -d $2",
)
# The symlinks of the opencv2 binaries for each platform in the xcframework
# have to be symlinked using the `ctx.actions.symlink` unlike the directory
# symlinks which can be expected to be valid when unzipping is completed.
# Otherwise, when tests are run, the linker complaints that the binary is
# not found.
binary_symlink_files = []
for (symlink_output, binary_file) in binaries_to_symlink:
ctx.actions.symlink(output = symlink_output, target_file = binary_file)
binary_symlink_files.append(symlink_output)
# Return all the declared output files and symlinks as the output of this
# rule.
return [DefaultInfo(files = depset(output_files + binary_symlink_files))]
# This rule unzips an `opencv2.xcframework.zip` created by a genrule that
# invokes a python script in the opencv 4.5.1 github archive.
# It returns all the contents of opencv2.xcframework as a list of files in the
# output. This rule works by explicitly declaring files at hardcoded
# paths in the opencv2 xcframework bundle which are expected to be present when
# the zip file is unzipped. This is a prerequisite since the outputs of this rule
# will be consumed by apple_static_xcframework_import which can only take a list
# of files as inputs.
unzip_opencv_xcframework = rule(
implementation = _unzip_opencv_xcframework_impl,
attrs = {
"zip_file": attr.label(mandatory = True, allow_single_file = True),
},
)

View File

@ -0,0 +1,468 @@
# Copyright 2023 The MediaPipe Authors. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""List of file paths in the `opencv2.xcframework` bundle."""
OPENCV_XCFRAMEWORK_INFO_PLIST_PATH = "opencv2.xcframework/Info.plist"
OPENCV_XCFRAMEWORK_IOS_SIMULATOR_FILE_PATHS = [
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Resources/Info.plist",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Moments.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfRect2d.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfFloat4.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfPoint2i.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/video/tracking.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/video/legacy/constants_c.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/video/background_segm.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/video/video.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Double3.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfByte.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Range.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Core.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Size2f.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/world.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/opencv2-Swift.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/fast_math.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda_types.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/check.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cv_cpu_dispatch.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utility.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/softfloat.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cv_cpu_helper.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cvstd.inl.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/msa_macros.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_rvv.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/simd_utils.impl.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_wasm.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_neon.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_avx.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_avx512.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_vsx.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/interface.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_msa.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_cpp.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_forward.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_sse.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_sse_em.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/hal.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/async.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/bufferpool.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/ovx.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/optim.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/va_intel.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cvdef.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/warp.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/filters.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/dynamic_smem.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/reduce.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/utility.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/warp_shuffle.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/border_interpolate.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/transform.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/saturate_cast.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/vec_math.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/functional.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/limits.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/type_traits.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/vec_distance.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/block.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/reduce.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/reduce_key_val.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/color_detail.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/type_traits_detail.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/vec_distance_detail.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/transform_detail.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/emulation.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/color.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/datamov_utils.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/funcattrib.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/common.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/vec_traits.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/simd_functions.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/warp_reduce.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/scan.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/traits.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opengl.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cvstd_wrapper.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda.inl.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/eigen.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda_stream_accessor.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/ocl.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/affine.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/mat.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/logger.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/allocator_stats.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/allocator_stats.impl.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/logtag.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/filesystem.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/tls.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/trace.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/instrumentation.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/logger.defines.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/quaternion.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/neon_utils.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/sse_utils.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/version.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/opencl_info.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_gl.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_definitions.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_hsa_extension.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_clamdblas.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_core.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_20.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_core_wrappers.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_gl_wrappers.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_clamdfft.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_gl.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_clamdblas.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_core.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_core_wrappers.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_gl_wrappers.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_clamdfft.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/ocl_defs.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/opencl_svm.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/ocl_genbase.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/detail/async_promise.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/detail/exception_ptr.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/simd_intrinsics.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/matx.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/directx.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/base.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/operations.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/vsx_utils.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/persistence.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/mat.inl.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/types_c.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cvstd.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/types.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/bindings_utils.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/quaternion.inl.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/saturate.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/core_c.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/core.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Converters.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Mat.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Algorithm.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/opencv.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Mat+Converters.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/ByteVector.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/imgproc.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/imgproc_c.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/hal/interface.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/hal/hal.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/detail/gcgraph.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/types_c.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/highgui.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/features2d.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point2f.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/KeyPoint.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Rect2f.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Float6.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfKeyPoint.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfRect2i.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/FloatVector.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/TermCriteria.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/opencv2.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Int4.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfDMatch.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Scalar.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point3f.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfDouble.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/IntVector.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/RotatedRect.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfFloat6.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/cvconfig.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/DoubleVector.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Size2d.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MinMaxLocResult.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfInt4.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Rect2i.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point2i.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfPoint3.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfRotatedRect.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/DMatch.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/TickMeter.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point3i.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/video.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs/ios.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs/legacy/constants_c.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs/macosx.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs/imgcodecs.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs/imgcodecs_c.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/CvType.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/CVObjcUtil.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Size2i.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Float4.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio/registry.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio/cap_ios.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio/legacy/constants_c.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio/videoio.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio/videoio_c.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfFloat.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Rect2d.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfPoint2f.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point2d.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/highgui/highgui.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/highgui/highgui_c.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Double2.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/CvCamera2.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/features2d/hal/interface.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/features2d/features2d.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/opencv_modules.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core.hpp",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfInt.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/ArrayUtil.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfPoint3f.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point3d.h",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/x86_64-apple-ios-simulator.swiftinterface",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios-simulator.abi.json",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios-simulator.private.swiftinterface",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/x86_64-apple-ios-simulator.swiftdoc",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/Project/arm64-apple-ios-simulator.swiftsourceinfo",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/Project/x86_64-apple-ios-simulator.swiftsourceinfo",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios-simulator.swiftinterface",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/x86_64-apple-ios-simulator.private.swiftinterface",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios-simulator.swiftdoc",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/x86_64-apple-ios-simulator.abi.json",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/module.modulemap",
"opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/opencv2",
]
OPENCV_XCFRAMEWORK_IOS_DEVICE_FILE_PATHS = [
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Resources/Info.plist",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Moments.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfRect2d.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfFloat4.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfPoint2i.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/video/tracking.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/video/legacy/constants_c.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/video/background_segm.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/video/video.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Double3.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfByte.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Range.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Core.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Size2f.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/world.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/opencv2-Swift.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/fast_math.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda_types.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/check.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cv_cpu_dispatch.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utility.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/softfloat.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cv_cpu_helper.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cvstd.inl.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/msa_macros.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_rvv.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/simd_utils.impl.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_wasm.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_neon.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_avx.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_avx512.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_vsx.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/interface.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_msa.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_cpp.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_forward.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_sse.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_sse_em.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/hal.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/async.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/bufferpool.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/ovx.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/optim.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/va_intel.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cvdef.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/warp.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/filters.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/dynamic_smem.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/reduce.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/utility.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/warp_shuffle.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/border_interpolate.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/transform.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/saturate_cast.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/vec_math.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/functional.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/limits.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/type_traits.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/vec_distance.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/block.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/reduce.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/reduce_key_val.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/color_detail.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/type_traits_detail.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/vec_distance_detail.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/transform_detail.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/emulation.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/color.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/datamov_utils.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/funcattrib.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/common.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/vec_traits.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/simd_functions.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/warp_reduce.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/scan.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/traits.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opengl.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cvstd_wrapper.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda.inl.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/eigen.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda_stream_accessor.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/ocl.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/affine.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/mat.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/logger.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/allocator_stats.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/allocator_stats.impl.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/logtag.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/filesystem.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/tls.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/trace.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/instrumentation.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/logger.defines.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/quaternion.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/neon_utils.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/sse_utils.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/version.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/opencl_info.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_gl.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_definitions.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_hsa_extension.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_clamdblas.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_core.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_20.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_core_wrappers.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_gl_wrappers.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_clamdfft.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_gl.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_clamdblas.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_core.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_core_wrappers.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_gl_wrappers.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_clamdfft.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/ocl_defs.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/opencl_svm.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/ocl_genbase.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/detail/async_promise.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/detail/exception_ptr.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/simd_intrinsics.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/matx.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/directx.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/base.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/operations.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/vsx_utils.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/persistence.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/mat.inl.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/types_c.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cvstd.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/types.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/bindings_utils.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/quaternion.inl.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/saturate.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/core_c.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/core.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Converters.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Mat.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Algorithm.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/opencv.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Mat+Converters.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/ByteVector.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/imgproc.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/imgproc_c.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/hal/interface.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/hal/hal.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/detail/gcgraph.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/types_c.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/highgui.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/features2d.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point2f.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/KeyPoint.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Rect2f.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Float6.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfKeyPoint.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfRect2i.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/FloatVector.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/TermCriteria.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/opencv2.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Int4.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfDMatch.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Scalar.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point3f.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfDouble.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/IntVector.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/RotatedRect.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfFloat6.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/cvconfig.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/DoubleVector.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Size2d.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MinMaxLocResult.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfInt4.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Rect2i.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point2i.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfPoint3.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfRotatedRect.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/DMatch.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/TickMeter.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point3i.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/video.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs/ios.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs/legacy/constants_c.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs/macosx.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs/imgcodecs.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs/imgcodecs_c.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/CvType.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/CVObjcUtil.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Size2i.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Float4.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio/registry.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio/cap_ios.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio/legacy/constants_c.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio/videoio.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio/videoio_c.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfFloat.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Rect2d.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfPoint2f.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point2d.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/highgui/highgui.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/highgui/highgui_c.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Double2.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/CvCamera2.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/features2d/hal/interface.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/features2d/features2d.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/opencv_modules.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core.hpp",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfInt.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/ArrayUtil.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfPoint3f.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point3d.h",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios.swiftinterface",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios.swiftdoc",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/Project/arm64-apple-ios.swiftsourceinfo",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios.abi.json",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios.private.swiftinterface",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/module.modulemap",
"opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/opencv2",
]