diff --git a/README.md b/README.md index a82c88ab1..cb3d56de6 100644 --- a/README.md +++ b/README.md @@ -4,8 +4,6 @@ title: Home nav_order: 1 --- -![MediaPipe](https://mediapipe.dev/images/mediapipe_small.png) - ---- **Attention:** *Thanks for your interest in MediaPipe! We have moved to @@ -14,86 +12,111 @@ as the primary developer documentation site for MediaPipe as of April 3, 2023.* *This notice and web page will be removed on June 1, 2023.* ----- +![MediaPipe](https://developers.google.com/static/mediapipe/images/home/hero_01_1920.png) -









-









-









+**Attention**: MediaPipe Solutions Preview is an early release. [Learn +more](https://developers.google.com/mediapipe/solutions/about#notice). --------------------------------------------------------------------------------- +**On-device machine learning for everyone** -## Live ML anywhere +Delight your customers with innovative machine learning features. MediaPipe +contains everything that you need to customize and deploy to mobile (Android, +iOS), web, desktop, edge devices, and IoT, effortlessly. -[MediaPipe](https://google.github.io/mediapipe/) offers cross-platform, customizable -ML solutions for live and streaming media. +* [See demos](https://goo.gle/mediapipe-studio) +* [Learn more](https://developers.google.com/mediapipe/solutions) -![accelerated.png](https://mediapipe.dev/images/accelerated_small.png) | ![cross_platform.png](https://mediapipe.dev/images/cross_platform_small.png) -:------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------: -***End-to-End acceleration***: *Built-in fast ML inference and processing accelerated even on common hardware* | ***Build once, deploy anywhere***: *Unified solution works across Android, iOS, desktop/cloud, web and IoT* -![ready_to_use.png](https://mediapipe.dev/images/ready_to_use_small.png) | ![open_source.png](https://mediapipe.dev/images/open_source_small.png) -***Ready-to-use solutions***: *Cutting-edge ML solutions demonstrating full power of the framework* | ***Free and open source***: *Framework and solutions both under Apache 2.0, fully extensible and customizable* +## Get started ----- +You can get started with MediaPipe Solutions by by checking out any of the +developer guides for +[vision](https://developers.google.com/mediapipe/solutions/vision/object_detector), +[text](https://developers.google.com/mediapipe/solutions/text/text_classifier), +and +[audio](https://developers.google.com/mediapipe/solutions/audio/audio_classifier) +tasks. If you need help setting up a development environment for use with +MediaPipe Tasks, check out the setup guides for +[Android](https://developers.google.com/mediapipe/solutions/setup_android), [web +apps](https://developers.google.com/mediapipe/solutions/setup_web), and +[Python](https://developers.google.com/mediapipe/solutions/setup_python). -## ML solutions in MediaPipe +## Solutions -Face Detection | Face Mesh | Iris | Hands | Pose | Holistic -:----------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------: | :------: -[![face_detection](https://mediapipe.dev/images/mobile/face_detection_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/face_detection) | [![face_mesh](https://mediapipe.dev/images/mobile/face_mesh_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/face_mesh) | [![iris](https://mediapipe.dev/images/mobile/iris_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/iris) | [![hand](https://mediapipe.dev/images/mobile/hand_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/hands) | [![pose](https://mediapipe.dev/images/mobile/pose_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/pose) | [![hair_segmentation](https://mediapipe.dev/images/mobile/holistic_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/holistic) +MediaPipe Solutions provides a suite of libraries and tools for you to quickly +apply artificial intelligence (AI) and machine learning (ML) techniques in your +applications. You can plug these solutions into your applications immediately, +customize them to your needs, and use them across multiple development +platforms. MediaPipe Solutions is part of the MediaPipe [open source +project](https://github.com/google/mediapipe), so you can further customize the +solutions code to meet your application needs. -Hair Segmentation | Object Detection | Box Tracking | Instant Motion Tracking | Objectron | KNIFT -:-------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | :---: -[![hair_segmentation](https://mediapipe.dev/images/mobile/hair_segmentation_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/hair_segmentation) | [![object_detection](https://mediapipe.dev/images/mobile/object_detection_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/object_detection) | [![box_tracking](https://mediapipe.dev/images/mobile/object_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/box_tracking) | [![instant_motion_tracking](https://mediapipe.dev/images/mobile/instant_motion_tracking_android_small.gif)](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | [![objectron](https://mediapipe.dev/images/mobile/objectron_chair_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/objectron) | [![knift](https://mediapipe.dev/images/mobile/template_matching_android_cpu_small.gif)](https://google.github.io/mediapipe/solutions/knift) +These libraries and resources provide the core functionality for each MediaPipe +Solution: - - +* **MediaPipe Tasks**: Cross-platform APIs and libraries for deploying + solutions. [Learn + more](https://developers.google.com/mediapipe/solutions/tasks). +* **MediaPipe models**: Pre-trained, ready-to-run models for use with each + solution. -[]() | [Android](https://google.github.io/mediapipe/getting_started/android) | [iOS](https://google.github.io/mediapipe/getting_started/ios) | [C++](https://google.github.io/mediapipe/getting_started/cpp) | [Python](https://google.github.io/mediapipe/getting_started/python) | [JS](https://google.github.io/mediapipe/getting_started/javascript) | [Coral](https://github.com/google/mediapipe/tree/master/mediapipe/examples/coral/README.md) -:---------------------------------------------------------------------------------------- | :-------------------------------------------------------------: | :-----------------------------------------------------: | :-----------------------------------------------------: | :-----------------------------------------------------------: | :-----------------------------------------------------------: | :--------------------------------------------------------------------: -[Face Detection](https://google.github.io/mediapipe/solutions/face_detection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ -[Face Mesh](https://google.github.io/mediapipe/solutions/face_mesh) | ✅ | ✅ | ✅ | ✅ | ✅ | -[Iris](https://google.github.io/mediapipe/solutions/iris) | ✅ | ✅ | ✅ | | | -[Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ | -[Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ | -[Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ | -[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ | -[Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | | -[Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅ -[Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | | -[Instant Motion Tracking](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | ✅ | | | | | -[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | ✅ | -[KNIFT](https://google.github.io/mediapipe/solutions/knift) | ✅ | | | | | -[AutoFlip](https://google.github.io/mediapipe/solutions/autoflip) | | | ✅ | | | -[MediaSequence](https://google.github.io/mediapipe/solutions/media_sequence) | | | ✅ | | | -[YouTube 8M](https://google.github.io/mediapipe/solutions/youtube_8m) | | | ✅ | | | +These tools let you customize and evaluate solutions: -See also -[MediaPipe Models and Model Cards](https://google.github.io/mediapipe/solutions/models) -for ML models released in MediaPipe. +* **MediaPipe Model Maker**: Customize models for solutions with your data. + [Learn more](https://developers.google.com/mediapipe/solutions/model_maker). +* **MediaPipe Studio**: Visualize, evaluate, and benchmark solutions in your + browser. [Learn + more](https://developers.google.com/mediapipe/solutions/studio). -## Getting started +### Legacy solutions -To start using MediaPipe -[solutions](https://google.github.io/mediapipe/solutions/solutions) with only a few -lines code, see example code and demos in -[MediaPipe in Python](https://google.github.io/mediapipe/getting_started/python) and -[MediaPipe in JavaScript](https://google.github.io/mediapipe/getting_started/javascript). +We have ended support for [these MediaPipe Legacy Solutions](https://developers.google.com/mediapipe/solutions/guide#legacy) +as of March 1, 2023. All other MediaPipe Legacy Solutions will be upgraded to +a new MediaPipe Solution. See the [Solutions guide](https://developers.google.com/mediapipe/solutions/guide#legacy) +for details. The [code repository](https://github.com/google/mediapipe/tree/master/mediapipe) +and prebuilt binaries for all MediaPipe Legacy Solutions will continue to be +provided on an as-is basis. -To use MediaPipe in C++, Android and iOS, which allow further customization of -the [solutions](https://google.github.io/mediapipe/solutions/solutions) as well as -building your own, learn how to -[install](https://google.github.io/mediapipe/getting_started/install) MediaPipe and -start building example applications in -[C++](https://google.github.io/mediapipe/getting_started/cpp), -[Android](https://google.github.io/mediapipe/getting_started/android) and -[iOS](https://google.github.io/mediapipe/getting_started/ios). +For more on the legacy solutions, see the [documentation](https://github.com/google/mediapipe/tree/master/docs/solutions). -The source code is hosted in the -[MediaPipe Github repository](https://github.com/google/mediapipe), and you can -run code search using -[Google Open Source Code Search](https://cs.opensource.google/mediapipe/mediapipe). +## Framework -## Publications +To start using MediaPipe Framework, [install MediaPipe +Framework](https://developers.google.com/mediapipe/framework/getting_started/install) +and start building example applications in C++, Android, and iOS. + +[MediaPipe Framework](https://developers.google.com/mediapipe/framework) is the +low-level component used to build efficient on-device machine learning +pipelines, similar to the premade MediaPipe Solutions. + +Before using MediaPipe Framework, familiarize yourself with the following key +[Framework +concepts](https://developers.google.com/mediapipe/framework/framework_concepts/overview.md): + +* [Packets](https://developers.google.com/mediapipe/framework/framework_concepts/packets.md) +* [Graphs](https://developers.google.com/mediapipe/framework/framework_concepts/graphs.md) +* [Calculators](https://developers.google.com/mediapipe/framework/framework_concepts/calculators.md) + +## Community + +* [Slack community](https://mediapipe.page.link/joinslack) for MediaPipe + users. +* [Discuss](https://groups.google.com/forum/#!forum/mediapipe) - General + community discussion around MediaPipe. +* [Awesome MediaPipe](https://mediapipe.page.link/awesome-mediapipe) - A + curated list of awesome MediaPipe related frameworks, libraries and + software. + +## Contributing + +We welcome contributions. Please follow these +[guidelines](https://github.com/google/mediapipe/blob/master/CONTRIBUTING.md). + +We use GitHub issues for tracking requests and bugs. Please post questions to +the MediaPipe Stack Overflow with a `mediapipe` tag. + +## Resources + +### Publications * [Bringing artworks to life with AR](https://developers.googleblog.com/2021/07/bringing-artworks-to-life-with-ar.html) in Google Developers Blog @@ -102,7 +125,8 @@ run code search using * [SignAll SDK: Sign language interface using MediaPipe is now available for developers](https://developers.googleblog.com/2021/04/signall-sdk-sign-language-interface-using-mediapipe-now-available.html) in Google Developers Blog -* [MediaPipe Holistic - Simultaneous Face, Hand and Pose Prediction, on Device](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html) +* [MediaPipe Holistic - Simultaneous Face, Hand and Pose Prediction, on + Device](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html) in Google AI Blog * [Background Features in Google Meet, Powered by Web ML](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html) in Google AI Blog @@ -130,43 +154,6 @@ run code search using in Google AI Blog * [MediaPipe: A Framework for Building Perception Pipelines](https://arxiv.org/abs/1906.08172) -## Videos +### Videos * [YouTube Channel](https://www.youtube.com/c/MediaPipe) - -## Events - -* [MediaPipe Seattle Meetup, Google Building Waterside, 13 Feb 2020](https://mediapipe.page.link/seattle2020) -* [AI Nextcon 2020, 12-16 Feb 2020, Seattle](http://aisea20.xnextcon.com/) -* [MediaPipe Madrid Meetup, 16 Dec 2019](https://www.meetup.com/Madrid-AI-Developers-Group/events/266329088/) -* [MediaPipe London Meetup, Google 123 Building, 12 Dec 2019](https://www.meetup.com/London-AI-Tech-Talk/events/266329038) -* [ML Conference, Berlin, 11 Dec 2019](https://mlconference.ai/machine-learning-advanced-development/mediapipe-building-real-time-cross-platform-mobile-web-edge-desktop-video-audio-ml-pipelines/) -* [MediaPipe Berlin Meetup, Google Berlin, 11 Dec 2019](https://www.meetup.com/Berlin-AI-Tech-Talk/events/266328794/) -* [The 3rd Workshop on YouTube-8M Large Scale Video Understanding Workshop, - Seoul, Korea ICCV - 2019](https://research.google.com/youtube8m/workshop2019/index.html) -* [AI DevWorld 2019, 10 Oct 2019, San Jose, CA](https://aidevworld.com) -* [Google Industry Workshop at ICIP 2019, 24 Sept 2019, Taipei, Taiwan](http://2019.ieeeicip.org/?action=page4&id=14#Google) - ([presentation](https://docs.google.com/presentation/d/e/2PACX-1vRIBBbO_LO9v2YmvbHHEt1cwyqH6EjDxiILjuT0foXy1E7g6uyh4CesB2DkkEwlRDO9_lWfuKMZx98T/pub?start=false&loop=false&delayms=3000&slide=id.g556cc1a659_0_5)) -* [Open sourced at CVPR 2019, 17~20 June, Long Beach, CA](https://sites.google.com/corp/view/perception-cv4arvr/mediapipe) - -## Community - -* [Awesome MediaPipe](https://mediapipe.page.link/awesome-mediapipe) - A - curated list of awesome MediaPipe related frameworks, libraries and software -* [Slack community](https://mediapipe.page.link/joinslack) for MediaPipe users -* [Discuss](https://groups.google.com/forum/#!forum/mediapipe) - General - community discussion around MediaPipe - -## Alpha disclaimer - -MediaPipe is currently in alpha at v0.7. We may be still making breaking API -changes and expect to get to stable APIs by v1.0. - -## Contributing - -We welcome contributions. Please follow these -[guidelines](https://github.com/google/mediapipe/blob/master/CONTRIBUTING.md). - -We use GitHub issues for tracking requests and bugs. Please post questions to -the MediaPipe Stack Overflow with a `mediapipe` tag. diff --git a/WORKSPACE b/WORKSPACE index 760898185..ee2506ed7 100644 --- a/WORKSPACE +++ b/WORKSPACE @@ -375,6 +375,18 @@ http_archive( url = "https://github.com/opencv/opencv/releases/download/3.2.0/opencv-3.2.0-ios-framework.zip", ) +# Building an opencv.xcframework from the OpenCV 4.5.1 sources is necessary for +# MediaPipe iOS Task Libraries to be supported on arm64(M1) Macs. An +# `opencv.xcframework` archive has not been released and it is recommended to +# build the same from source using a script provided in OpenCV 4.5.0 upwards. +http_archive( + name = "ios_opencv_source", + sha256 = "5fbc26ee09e148a4d494b225d04217f7c913ca1a4d46115b70cca3565d7bbe05", + build_file = "@//third_party:opencv_ios_source.BUILD", + type = "zip", + url = "https://github.com/opencv/opencv/archive/refs/tags/4.5.1.zip", +) + http_archive( name = "stblib", strip_prefix = "stb-b42009b3b9d4ca35bc703f5310eedc74f584be58", diff --git a/docs/index.md b/docs/index.md index a82c88ab1..cb3d56de6 100644 --- a/docs/index.md +++ b/docs/index.md @@ -4,8 +4,6 @@ title: Home nav_order: 1 --- -![MediaPipe](https://mediapipe.dev/images/mediapipe_small.png) - ---- **Attention:** *Thanks for your interest in MediaPipe! We have moved to @@ -14,86 +12,111 @@ as the primary developer documentation site for MediaPipe as of April 3, 2023.* *This notice and web page will be removed on June 1, 2023.* ----- +![MediaPipe](https://developers.google.com/static/mediapipe/images/home/hero_01_1920.png) -









-









-









+**Attention**: MediaPipe Solutions Preview is an early release. [Learn +more](https://developers.google.com/mediapipe/solutions/about#notice). --------------------------------------------------------------------------------- +**On-device machine learning for everyone** -## Live ML anywhere +Delight your customers with innovative machine learning features. MediaPipe +contains everything that you need to customize and deploy to mobile (Android, +iOS), web, desktop, edge devices, and IoT, effortlessly. -[MediaPipe](https://google.github.io/mediapipe/) offers cross-platform, customizable -ML solutions for live and streaming media. +* [See demos](https://goo.gle/mediapipe-studio) +* [Learn more](https://developers.google.com/mediapipe/solutions) -![accelerated.png](https://mediapipe.dev/images/accelerated_small.png) | ![cross_platform.png](https://mediapipe.dev/images/cross_platform_small.png) -:------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------: -***End-to-End acceleration***: *Built-in fast ML inference and processing accelerated even on common hardware* | ***Build once, deploy anywhere***: *Unified solution works across Android, iOS, desktop/cloud, web and IoT* -![ready_to_use.png](https://mediapipe.dev/images/ready_to_use_small.png) | ![open_source.png](https://mediapipe.dev/images/open_source_small.png) -***Ready-to-use solutions***: *Cutting-edge ML solutions demonstrating full power of the framework* | ***Free and open source***: *Framework and solutions both under Apache 2.0, fully extensible and customizable* +## Get started ----- +You can get started with MediaPipe Solutions by by checking out any of the +developer guides for +[vision](https://developers.google.com/mediapipe/solutions/vision/object_detector), +[text](https://developers.google.com/mediapipe/solutions/text/text_classifier), +and +[audio](https://developers.google.com/mediapipe/solutions/audio/audio_classifier) +tasks. If you need help setting up a development environment for use with +MediaPipe Tasks, check out the setup guides for +[Android](https://developers.google.com/mediapipe/solutions/setup_android), [web +apps](https://developers.google.com/mediapipe/solutions/setup_web), and +[Python](https://developers.google.com/mediapipe/solutions/setup_python). -## ML solutions in MediaPipe +## Solutions -Face Detection | Face Mesh | Iris | Hands | Pose | Holistic -:----------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------: | :------: -[![face_detection](https://mediapipe.dev/images/mobile/face_detection_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/face_detection) | [![face_mesh](https://mediapipe.dev/images/mobile/face_mesh_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/face_mesh) | [![iris](https://mediapipe.dev/images/mobile/iris_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/iris) | [![hand](https://mediapipe.dev/images/mobile/hand_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/hands) | [![pose](https://mediapipe.dev/images/mobile/pose_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/pose) | [![hair_segmentation](https://mediapipe.dev/images/mobile/holistic_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/holistic) +MediaPipe Solutions provides a suite of libraries and tools for you to quickly +apply artificial intelligence (AI) and machine learning (ML) techniques in your +applications. You can plug these solutions into your applications immediately, +customize them to your needs, and use them across multiple development +platforms. MediaPipe Solutions is part of the MediaPipe [open source +project](https://github.com/google/mediapipe), so you can further customize the +solutions code to meet your application needs. -Hair Segmentation | Object Detection | Box Tracking | Instant Motion Tracking | Objectron | KNIFT -:-------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------: | :---: -[![hair_segmentation](https://mediapipe.dev/images/mobile/hair_segmentation_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/hair_segmentation) | [![object_detection](https://mediapipe.dev/images/mobile/object_detection_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/object_detection) | [![box_tracking](https://mediapipe.dev/images/mobile/object_tracking_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/box_tracking) | [![instant_motion_tracking](https://mediapipe.dev/images/mobile/instant_motion_tracking_android_small.gif)](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | [![objectron](https://mediapipe.dev/images/mobile/objectron_chair_android_gpu_small.gif)](https://google.github.io/mediapipe/solutions/objectron) | [![knift](https://mediapipe.dev/images/mobile/template_matching_android_cpu_small.gif)](https://google.github.io/mediapipe/solutions/knift) +These libraries and resources provide the core functionality for each MediaPipe +Solution: - - +* **MediaPipe Tasks**: Cross-platform APIs and libraries for deploying + solutions. [Learn + more](https://developers.google.com/mediapipe/solutions/tasks). +* **MediaPipe models**: Pre-trained, ready-to-run models for use with each + solution. -[]() | [Android](https://google.github.io/mediapipe/getting_started/android) | [iOS](https://google.github.io/mediapipe/getting_started/ios) | [C++](https://google.github.io/mediapipe/getting_started/cpp) | [Python](https://google.github.io/mediapipe/getting_started/python) | [JS](https://google.github.io/mediapipe/getting_started/javascript) | [Coral](https://github.com/google/mediapipe/tree/master/mediapipe/examples/coral/README.md) -:---------------------------------------------------------------------------------------- | :-------------------------------------------------------------: | :-----------------------------------------------------: | :-----------------------------------------------------: | :-----------------------------------------------------------: | :-----------------------------------------------------------: | :--------------------------------------------------------------------: -[Face Detection](https://google.github.io/mediapipe/solutions/face_detection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ -[Face Mesh](https://google.github.io/mediapipe/solutions/face_mesh) | ✅ | ✅ | ✅ | ✅ | ✅ | -[Iris](https://google.github.io/mediapipe/solutions/iris) | ✅ | ✅ | ✅ | | | -[Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ | -[Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ | -[Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ | -[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ | -[Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | | -[Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅ -[Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | | -[Instant Motion Tracking](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | ✅ | | | | | -[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | ✅ | -[KNIFT](https://google.github.io/mediapipe/solutions/knift) | ✅ | | | | | -[AutoFlip](https://google.github.io/mediapipe/solutions/autoflip) | | | ✅ | | | -[MediaSequence](https://google.github.io/mediapipe/solutions/media_sequence) | | | ✅ | | | -[YouTube 8M](https://google.github.io/mediapipe/solutions/youtube_8m) | | | ✅ | | | +These tools let you customize and evaluate solutions: -See also -[MediaPipe Models and Model Cards](https://google.github.io/mediapipe/solutions/models) -for ML models released in MediaPipe. +* **MediaPipe Model Maker**: Customize models for solutions with your data. + [Learn more](https://developers.google.com/mediapipe/solutions/model_maker). +* **MediaPipe Studio**: Visualize, evaluate, and benchmark solutions in your + browser. [Learn + more](https://developers.google.com/mediapipe/solutions/studio). -## Getting started +### Legacy solutions -To start using MediaPipe -[solutions](https://google.github.io/mediapipe/solutions/solutions) with only a few -lines code, see example code and demos in -[MediaPipe in Python](https://google.github.io/mediapipe/getting_started/python) and -[MediaPipe in JavaScript](https://google.github.io/mediapipe/getting_started/javascript). +We have ended support for [these MediaPipe Legacy Solutions](https://developers.google.com/mediapipe/solutions/guide#legacy) +as of March 1, 2023. All other MediaPipe Legacy Solutions will be upgraded to +a new MediaPipe Solution. See the [Solutions guide](https://developers.google.com/mediapipe/solutions/guide#legacy) +for details. The [code repository](https://github.com/google/mediapipe/tree/master/mediapipe) +and prebuilt binaries for all MediaPipe Legacy Solutions will continue to be +provided on an as-is basis. -To use MediaPipe in C++, Android and iOS, which allow further customization of -the [solutions](https://google.github.io/mediapipe/solutions/solutions) as well as -building your own, learn how to -[install](https://google.github.io/mediapipe/getting_started/install) MediaPipe and -start building example applications in -[C++](https://google.github.io/mediapipe/getting_started/cpp), -[Android](https://google.github.io/mediapipe/getting_started/android) and -[iOS](https://google.github.io/mediapipe/getting_started/ios). +For more on the legacy solutions, see the [documentation](https://github.com/google/mediapipe/tree/master/docs/solutions). -The source code is hosted in the -[MediaPipe Github repository](https://github.com/google/mediapipe), and you can -run code search using -[Google Open Source Code Search](https://cs.opensource.google/mediapipe/mediapipe). +## Framework -## Publications +To start using MediaPipe Framework, [install MediaPipe +Framework](https://developers.google.com/mediapipe/framework/getting_started/install) +and start building example applications in C++, Android, and iOS. + +[MediaPipe Framework](https://developers.google.com/mediapipe/framework) is the +low-level component used to build efficient on-device machine learning +pipelines, similar to the premade MediaPipe Solutions. + +Before using MediaPipe Framework, familiarize yourself with the following key +[Framework +concepts](https://developers.google.com/mediapipe/framework/framework_concepts/overview.md): + +* [Packets](https://developers.google.com/mediapipe/framework/framework_concepts/packets.md) +* [Graphs](https://developers.google.com/mediapipe/framework/framework_concepts/graphs.md) +* [Calculators](https://developers.google.com/mediapipe/framework/framework_concepts/calculators.md) + +## Community + +* [Slack community](https://mediapipe.page.link/joinslack) for MediaPipe + users. +* [Discuss](https://groups.google.com/forum/#!forum/mediapipe) - General + community discussion around MediaPipe. +* [Awesome MediaPipe](https://mediapipe.page.link/awesome-mediapipe) - A + curated list of awesome MediaPipe related frameworks, libraries and + software. + +## Contributing + +We welcome contributions. Please follow these +[guidelines](https://github.com/google/mediapipe/blob/master/CONTRIBUTING.md). + +We use GitHub issues for tracking requests and bugs. Please post questions to +the MediaPipe Stack Overflow with a `mediapipe` tag. + +## Resources + +### Publications * [Bringing artworks to life with AR](https://developers.googleblog.com/2021/07/bringing-artworks-to-life-with-ar.html) in Google Developers Blog @@ -102,7 +125,8 @@ run code search using * [SignAll SDK: Sign language interface using MediaPipe is now available for developers](https://developers.googleblog.com/2021/04/signall-sdk-sign-language-interface-using-mediapipe-now-available.html) in Google Developers Blog -* [MediaPipe Holistic - Simultaneous Face, Hand and Pose Prediction, on Device](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html) +* [MediaPipe Holistic - Simultaneous Face, Hand and Pose Prediction, on + Device](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html) in Google AI Blog * [Background Features in Google Meet, Powered by Web ML](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html) in Google AI Blog @@ -130,43 +154,6 @@ run code search using in Google AI Blog * [MediaPipe: A Framework for Building Perception Pipelines](https://arxiv.org/abs/1906.08172) -## Videos +### Videos * [YouTube Channel](https://www.youtube.com/c/MediaPipe) - -## Events - -* [MediaPipe Seattle Meetup, Google Building Waterside, 13 Feb 2020](https://mediapipe.page.link/seattle2020) -* [AI Nextcon 2020, 12-16 Feb 2020, Seattle](http://aisea20.xnextcon.com/) -* [MediaPipe Madrid Meetup, 16 Dec 2019](https://www.meetup.com/Madrid-AI-Developers-Group/events/266329088/) -* [MediaPipe London Meetup, Google 123 Building, 12 Dec 2019](https://www.meetup.com/London-AI-Tech-Talk/events/266329038) -* [ML Conference, Berlin, 11 Dec 2019](https://mlconference.ai/machine-learning-advanced-development/mediapipe-building-real-time-cross-platform-mobile-web-edge-desktop-video-audio-ml-pipelines/) -* [MediaPipe Berlin Meetup, Google Berlin, 11 Dec 2019](https://www.meetup.com/Berlin-AI-Tech-Talk/events/266328794/) -* [The 3rd Workshop on YouTube-8M Large Scale Video Understanding Workshop, - Seoul, Korea ICCV - 2019](https://research.google.com/youtube8m/workshop2019/index.html) -* [AI DevWorld 2019, 10 Oct 2019, San Jose, CA](https://aidevworld.com) -* [Google Industry Workshop at ICIP 2019, 24 Sept 2019, Taipei, Taiwan](http://2019.ieeeicip.org/?action=page4&id=14#Google) - ([presentation](https://docs.google.com/presentation/d/e/2PACX-1vRIBBbO_LO9v2YmvbHHEt1cwyqH6EjDxiILjuT0foXy1E7g6uyh4CesB2DkkEwlRDO9_lWfuKMZx98T/pub?start=false&loop=false&delayms=3000&slide=id.g556cc1a659_0_5)) -* [Open sourced at CVPR 2019, 17~20 June, Long Beach, CA](https://sites.google.com/corp/view/perception-cv4arvr/mediapipe) - -## Community - -* [Awesome MediaPipe](https://mediapipe.page.link/awesome-mediapipe) - A - curated list of awesome MediaPipe related frameworks, libraries and software -* [Slack community](https://mediapipe.page.link/joinslack) for MediaPipe users -* [Discuss](https://groups.google.com/forum/#!forum/mediapipe) - General - community discussion around MediaPipe - -## Alpha disclaimer - -MediaPipe is currently in alpha at v0.7. We may be still making breaking API -changes and expect to get to stable APIs by v1.0. - -## Contributing - -We welcome contributions. Please follow these -[guidelines](https://github.com/google/mediapipe/blob/master/CONTRIBUTING.md). - -We use GitHub issues for tracking requests and bugs. Please post questions to -the MediaPipe Stack Overflow with a `mediapipe` tag. diff --git a/mediapipe/BUILD b/mediapipe/BUILD index 3187c0cf7..fd0cbab36 100644 --- a/mediapipe/BUILD +++ b/mediapipe/BUILD @@ -141,6 +141,7 @@ config_setting( "ios_armv7", "ios_arm64", "ios_arm64e", + "ios_sim_arm64", ] ] diff --git a/mediapipe/framework/BUILD b/mediapipe/framework/BUILD index ae788ed58..126261c90 100644 --- a/mediapipe/framework/BUILD +++ b/mediapipe/framework/BUILD @@ -33,7 +33,9 @@ bzl_library( srcs = [ "transitive_protos.bzl", ], - visibility = ["//mediapipe/framework:__subpackages__"], + visibility = [ + "//mediapipe/framework:__subpackages__", + ], ) bzl_library( diff --git a/mediapipe/framework/calculator_options.proto b/mediapipe/framework/calculator_options.proto index 747e9c4af..3bc9f6615 100644 --- a/mediapipe/framework/calculator_options.proto +++ b/mediapipe/framework/calculator_options.proto @@ -23,15 +23,13 @@ package mediapipe; option java_package = "com.google.mediapipe.proto"; option java_outer_classname = "CalculatorOptionsProto"; -// Options for Calculators. Each Calculator implementation should -// have its own options proto, which should look like this: +// Options for Calculators, DEPRECATED. New calculators are encouraged to use +// proto3 syntax options: // // message MyCalculatorOptions { -// extend CalculatorOptions { -// optional MyCalculatorOptions ext = ; -// } -// optional string field_needed_by_my_calculator = 1; -// optional int32 another_field = 2; +// // proto3 does not expect "optional" +// string field_needed_by_my_calculator = 1; +// int32 another_field = 2; // // etc // } message CalculatorOptions { diff --git a/mediapipe/framework/profiler/testing/BUILD b/mediapipe/framework/profiler/testing/BUILD index 0b0d256e5..67668ef7d 100644 --- a/mediapipe/framework/profiler/testing/BUILD +++ b/mediapipe/framework/profiler/testing/BUILD @@ -15,9 +15,7 @@ licenses(["notice"]) -package( - default_visibility = ["//mediapipe/framework:__subpackages__"], -) +package(default_visibility = ["//mediapipe/framework:__subpackages__"]) cc_library( name = "simple_calculator", diff --git a/mediapipe/framework/tool/template_parser.cc b/mediapipe/framework/tool/template_parser.cc index f012ac418..ad799c34f 100644 --- a/mediapipe/framework/tool/template_parser.cc +++ b/mediapipe/framework/tool/template_parser.cc @@ -974,7 +974,7 @@ class TemplateParser::Parser::ParserImpl { } // Consumes an identifier and saves its value in the identifier parameter. - // Returns false if the token is not of type IDENTFIER. + // Returns false if the token is not of type IDENTIFIER. bool ConsumeIdentifier(std::string* identifier) { if (LookingAtType(io::Tokenizer::TYPE_IDENTIFIER)) { *identifier = tokenizer_.current().text; @@ -1672,7 +1672,9 @@ class TemplateParser::Parser::MediaPipeParserImpl if (field_type == ProtoUtilLite::FieldType::TYPE_MESSAGE) { *args = {""}; } else { - MEDIAPIPE_CHECK_OK(ProtoUtilLite::Serialize({"1"}, field_type, args)); + constexpr char kPlaceholderValue[] = "1"; + MEDIAPIPE_CHECK_OK( + ProtoUtilLite::Serialize({kPlaceholderValue}, field_type, args)); } } diff --git a/mediapipe/model_maker/models/gesture_recognizer/BUILD b/mediapipe/model_maker/models/gesture_recognizer/BUILD index 947508f1b..c57d7a2c9 100644 --- a/mediapipe/model_maker/models/gesture_recognizer/BUILD +++ b/mediapipe/model_maker/models/gesture_recognizer/BUILD @@ -19,9 +19,7 @@ load( licenses(["notice"]) -package( - default_visibility = ["//mediapipe/model_maker/python/vision/gesture_recognizer:__subpackages__"], -) +package(default_visibility = ["//mediapipe/model_maker/python/vision/gesture_recognizer:__subpackages__"]) mediapipe_files( srcs = [ diff --git a/mediapipe/model_maker/models/text_classifier/BUILD b/mediapipe/model_maker/models/text_classifier/BUILD index d9d55048d..460d6cfd1 100644 --- a/mediapipe/model_maker/models/text_classifier/BUILD +++ b/mediapipe/model_maker/models/text_classifier/BUILD @@ -19,9 +19,7 @@ load( licenses(["notice"]) -package( - default_visibility = ["//mediapipe/model_maker/python/text/text_classifier:__subpackages__"], -) +package(default_visibility = ["//mediapipe/model_maker/python/text/text_classifier:__subpackages__"]) mediapipe_files( srcs = [ diff --git a/mediapipe/model_maker/python/core/BUILD b/mediapipe/model_maker/python/core/BUILD index 6331e638e..0ed20a2fe 100644 --- a/mediapipe/model_maker/python/core/BUILD +++ b/mediapipe/model_maker/python/core/BUILD @@ -14,9 +14,7 @@ # Placeholder for internal Python strict library and test compatibility macro. -package( - default_visibility = ["//mediapipe:__subpackages__"], -) +package(default_visibility = ["//mediapipe:__subpackages__"]) licenses(["notice"]) diff --git a/mediapipe/model_maker/python/core/data/BUILD b/mediapipe/model_maker/python/core/data/BUILD index cc0381f60..1c2fb7a44 100644 --- a/mediapipe/model_maker/python/core/data/BUILD +++ b/mediapipe/model_maker/python/core/data/BUILD @@ -17,9 +17,7 @@ licenses(["notice"]) -package( - default_visibility = ["//mediapipe:__subpackages__"], -) +package(default_visibility = ["//mediapipe:__subpackages__"]) py_library( name = "data_util", diff --git a/mediapipe/model_maker/python/core/tasks/BUILD b/mediapipe/model_maker/python/core/tasks/BUILD index 6a3e60c97..818d78feb 100644 --- a/mediapipe/model_maker/python/core/tasks/BUILD +++ b/mediapipe/model_maker/python/core/tasks/BUILD @@ -15,9 +15,7 @@ # Placeholder for internal Python strict library and test compatibility macro. # Placeholder for internal Python strict test compatibility macro. -package( - default_visibility = ["//mediapipe:__subpackages__"], -) +package(default_visibility = ["//mediapipe:__subpackages__"]) licenses(["notice"]) diff --git a/mediapipe/model_maker/python/core/utils/BUILD b/mediapipe/model_maker/python/core/utils/BUILD index e86cbb1e3..ef9cab290 100644 --- a/mediapipe/model_maker/python/core/utils/BUILD +++ b/mediapipe/model_maker/python/core/utils/BUILD @@ -17,9 +17,7 @@ licenses(["notice"]) -package( - default_visibility = ["//mediapipe:__subpackages__"], -) +package(default_visibility = ["//mediapipe:__subpackages__"]) py_library( name = "test_util", diff --git a/mediapipe/model_maker/python/text/core/BUILD b/mediapipe/model_maker/python/text/core/BUILD index 3ba4e8e6e..d99f46b77 100644 --- a/mediapipe/model_maker/python/text/core/BUILD +++ b/mediapipe/model_maker/python/text/core/BUILD @@ -14,9 +14,7 @@ # Placeholder for internal Python strict library and test compatibility macro. -package( - default_visibility = ["//mediapipe:__subpackages__"], -) +package(default_visibility = ["//mediapipe:__subpackages__"]) licenses(["notice"]) diff --git a/mediapipe/model_maker/python/text/text_classifier/BUILD b/mediapipe/model_maker/python/text/text_classifier/BUILD index 1ae3e2873..9fe96849b 100644 --- a/mediapipe/model_maker/python/text/text_classifier/BUILD +++ b/mediapipe/model_maker/python/text/text_classifier/BUILD @@ -15,9 +15,7 @@ # Placeholder for internal Python strict library and test compatibility macro. # Placeholder for internal Python strict test compatibility macro. -package( - default_visibility = ["//mediapipe:__subpackages__"], -) +package(default_visibility = ["//mediapipe:__subpackages__"]) licenses(["notice"]) diff --git a/mediapipe/model_maker/python/vision/BUILD b/mediapipe/model_maker/python/vision/BUILD index 4410d859f..b7d0d13a6 100644 --- a/mediapipe/model_maker/python/vision/BUILD +++ b/mediapipe/model_maker/python/vision/BUILD @@ -12,8 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -package( - default_visibility = ["//mediapipe:__subpackages__"], -) +package(default_visibility = ["//mediapipe:__subpackages__"]) licenses(["notice"]) diff --git a/mediapipe/model_maker/python/vision/gesture_recognizer/BUILD b/mediapipe/model_maker/python/vision/gesture_recognizer/BUILD index 578723fb0..27f8934b3 100644 --- a/mediapipe/model_maker/python/vision/gesture_recognizer/BUILD +++ b/mediapipe/model_maker/python/vision/gesture_recognizer/BUILD @@ -17,9 +17,7 @@ licenses(["notice"]) -package( - default_visibility = ["//mediapipe:__subpackages__"], -) +package(default_visibility = ["//mediapipe:__subpackages__"]) # TODO: Remove the unnecessary test data once the demo data are moved to an open-sourced # directory. diff --git a/mediapipe/model_maker/python/vision/image_classifier/BUILD b/mediapipe/model_maker/python/vision/image_classifier/BUILD index 3b6d7551a..73d1d2f7c 100644 --- a/mediapipe/model_maker/python/vision/image_classifier/BUILD +++ b/mediapipe/model_maker/python/vision/image_classifier/BUILD @@ -17,9 +17,7 @@ licenses(["notice"]) -package( - default_visibility = ["//mediapipe:__subpackages__"], -) +package(default_visibility = ["//mediapipe:__subpackages__"]) ###################################################################### # Public target of the MediaPipe Model Maker ImageClassifier APIs. diff --git a/mediapipe/model_maker/python/vision/object_detector/BUILD b/mediapipe/model_maker/python/vision/object_detector/BUILD index f9c3f00fc..75c08dbc8 100644 --- a/mediapipe/model_maker/python/vision/object_detector/BUILD +++ b/mediapipe/model_maker/python/vision/object_detector/BUILD @@ -17,9 +17,7 @@ licenses(["notice"]) -package( - default_visibility = ["//mediapipe:__subpackages__"], -) +package(default_visibility = ["//mediapipe:__subpackages__"]) py_library( name = "object_detector_import", @@ -88,6 +86,17 @@ py_test( ], ) +py_library( + name = "detection", + srcs = ["detection.py"], +) + +py_test( + name = "detection_test", + srcs = ["detection_test.py"], + deps = [":detection"], +) + py_library( name = "hyperparameters", srcs = ["hyperparameters.py"], @@ -116,6 +125,7 @@ py_library( name = "model", srcs = ["model.py"], deps = [ + ":detection", ":model_options", ":model_spec", ], @@ -163,6 +173,7 @@ py_library( "//mediapipe/model_maker/python/core/tasks:classifier", "//mediapipe/model_maker/python/core/utils:model_util", "//mediapipe/model_maker/python/core/utils:quantization", + "//mediapipe/tasks/python/metadata/metadata_writers:metadata_info", "//mediapipe/tasks/python/metadata/metadata_writers:metadata_writer", "//mediapipe/tasks/python/metadata/metadata_writers:object_detector", ], diff --git a/mediapipe/model_maker/python/vision/object_detector/__init__.py b/mediapipe/model_maker/python/vision/object_detector/__init__.py index 4670b343c..3e0a62bf8 100644 --- a/mediapipe/model_maker/python/vision/object_detector/__init__.py +++ b/mediapipe/model_maker/python/vision/object_detector/__init__.py @@ -32,6 +32,7 @@ ObjectDetectorOptions = object_detector_options.ObjectDetectorOptions # Remove duplicated and non-public API del dataset del dataset_util # pylint: disable=undefined-variable +del detection # pylint: disable=undefined-variable del hyperparameters del model # pylint: disable=undefined-variable del model_options diff --git a/mediapipe/model_maker/python/vision/object_detector/dataset.py b/mediapipe/model_maker/python/vision/object_detector/dataset.py index 6899d8612..c18a071b2 100644 --- a/mediapipe/model_maker/python/vision/object_detector/dataset.py +++ b/mediapipe/model_maker/python/vision/object_detector/dataset.py @@ -106,7 +106,7 @@ class Dataset(classification_dataset.ClassificationDataset): ... Each .xml annotation file should have the following format: - file0.jpg + file0.jpg kangaroo @@ -114,6 +114,7 @@ class Dataset(classification_dataset.ClassificationDataset): 89 386 262 + ... diff --git a/mediapipe/model_maker/python/vision/object_detector/detection.py b/mediapipe/model_maker/python/vision/object_detector/detection.py new file mode 100644 index 000000000..769189b24 --- /dev/null +++ b/mediapipe/model_maker/python/vision/object_detector/detection.py @@ -0,0 +1,34 @@ +# Copyright 2023 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Custom Detection export module for Object Detection.""" + +from typing import Any, Mapping + +from official.vision.serving import detection + + +class DetectionModule(detection.DetectionModule): + """A serving detection module for exporting the model. + + This module overrides the tensorflow_models DetectionModule by only outputting + the pre-nms detection_boxes and detection_scores. + """ + + def serve(self, images) -> Mapping[str, Any]: + result = super().serve(images) + final_outputs = { + 'detection_boxes': result['detection_boxes'], + 'detection_scores': result['detection_scores'], + } + return final_outputs diff --git a/mediapipe/model_maker/python/vision/object_detector/detection_test.py b/mediapipe/model_maker/python/vision/object_detector/detection_test.py new file mode 100644 index 000000000..34f16c21c --- /dev/null +++ b/mediapipe/model_maker/python/vision/object_detector/detection_test.py @@ -0,0 +1,73 @@ +# Copyright 2023 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the 'License'); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an 'AS IS' BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from unittest import mock +import tensorflow as tf + +from mediapipe.model_maker.python.vision.object_detector import detection +from official.core import config_definitions as cfg +from official.vision import configs +from official.vision.serving import detection as detection_module + + +class ObjectDetectorTest(tf.test.TestCase): + + @mock.patch.object(detection_module.DetectionModule, 'serve', autospec=True) + def test_detection_module(self, mock_serve): + mock_serve.return_value = { + 'detection_boxes': 1, + 'detection_scores': 2, + 'detection_classes': 3, + 'num_detections': 4, + } + model_config = configs.retinanet.RetinaNet( + min_level=3, + max_level=7, + num_classes=10, + input_size=[256, 256, 3], + anchor=configs.retinanet.Anchor( + num_scales=3, aspect_ratios=[0.5, 1.0, 2.0], anchor_size=3 + ), + backbone=configs.backbones.Backbone( + type='mobilenet', mobilenet=configs.backbones.MobileNet() + ), + decoder=configs.decoders.Decoder( + type='fpn', + fpn=configs.decoders.FPN( + num_filters=128, use_separable_conv=True, use_keras_layer=True + ), + ), + head=configs.retinanet.RetinaNetHead( + num_filters=128, use_separable_conv=True + ), + detection_generator=configs.retinanet.DetectionGenerator(), + norm_activation=configs.common.NormActivation(activation='relu6'), + ) + task_config = configs.retinanet.RetinaNetTask(model=model_config) + params = cfg.ExperimentConfig( + task=task_config, + ) + detection_instance = detection.DetectionModule( + params=params, batch_size=1, input_image_size=[256, 256] + ) + outputs = detection_instance.serve(0) + expected_outputs = { + 'detection_boxes': 1, + 'detection_scores': 2, + } + self.assertAllEqual(outputs, expected_outputs) + + +if __name__ == '__main__': + tf.test.main() diff --git a/mediapipe/model_maker/python/vision/object_detector/hyperparameters.py b/mediapipe/model_maker/python/vision/object_detector/hyperparameters.py index 1bc7514f2..35fb630ae 100644 --- a/mediapipe/model_maker/python/vision/object_detector/hyperparameters.py +++ b/mediapipe/model_maker/python/vision/object_detector/hyperparameters.py @@ -27,8 +27,6 @@ class HParams(hp.BaseHParams): learning_rate: Learning rate to use for gradient descent training. batch_size: Batch size for training. epochs: Number of training iterations over the dataset. - do_fine_tuning: If true, the base module is trained together with the - classification layer on top. cosine_decay_epochs: The number of epochs for cosine decay learning rate. See https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/CosineDecay @@ -39,13 +37,13 @@ class HParams(hp.BaseHParams): """ # Parameters from BaseHParams class. - learning_rate: float = 0.003 - batch_size: int = 32 - epochs: int = 10 + learning_rate: float = 0.3 + batch_size: int = 8 + epochs: int = 30 # Parameters for cosine learning rate decay cosine_decay_epochs: Optional[int] = None - cosine_decay_alpha: float = 0.0 + cosine_decay_alpha: float = 1.0 @dataclasses.dataclass @@ -67,8 +65,8 @@ class QATHParams: for more information. """ - learning_rate: float = 0.03 - batch_size: int = 32 - epochs: int = 10 - decay_steps: int = 231 + learning_rate: float = 0.3 + batch_size: int = 8 + epochs: int = 15 + decay_steps: int = 8 decay_rate: float = 0.96 diff --git a/mediapipe/model_maker/python/vision/object_detector/model.py b/mediapipe/model_maker/python/vision/object_detector/model.py index e3eb3a651..70e63d5b5 100644 --- a/mediapipe/model_maker/python/vision/object_detector/model.py +++ b/mediapipe/model_maker/python/vision/object_detector/model.py @@ -18,6 +18,7 @@ from typing import Mapping, Optional, Sequence, Union import tensorflow as tf +from mediapipe.model_maker.python.vision.object_detector import detection from mediapipe.model_maker.python.vision.object_detector import model_options as model_opt from mediapipe.model_maker.python.vision.object_detector import model_spec as ms from official.core import config_definitions as cfg @@ -29,7 +30,6 @@ from official.vision.losses import loss_utils from official.vision.modeling import factory from official.vision.modeling import retinanet_model from official.vision.modeling.layers import detection_generator -from official.vision.serving import detection class ObjectDetectorModel(tf.keras.Model): @@ -199,6 +199,7 @@ class ObjectDetectorModel(tf.keras.Model): max_detections=10, max_classes_per_detection=1, normalize_anchor_coordinates=True, + omit_nms=True, ), ) tflite_post_processing_config = ( diff --git a/mediapipe/model_maker/python/vision/object_detector/object_detector.py b/mediapipe/model_maker/python/vision/object_detector/object_detector.py index 746eef1b3..486c3ffa9 100644 --- a/mediapipe/model_maker/python/vision/object_detector/object_detector.py +++ b/mediapipe/model_maker/python/vision/object_detector/object_detector.py @@ -28,6 +28,7 @@ from mediapipe.model_maker.python.vision.object_detector import model_options as from mediapipe.model_maker.python.vision.object_detector import model_spec as ms from mediapipe.model_maker.python.vision.object_detector import object_detector_options from mediapipe.model_maker.python.vision.object_detector import preprocessor +from mediapipe.tasks.python.metadata.metadata_writers import metadata_info from mediapipe.tasks.python.metadata.metadata_writers import metadata_writer from mediapipe.tasks.python.metadata.metadata_writers import object_detector as object_detector_writer from official.vision.evaluation import coco_evaluator @@ -264,6 +265,27 @@ class ObjectDetector(classifier.Classifier): coco_metrics = coco_eval.result() return losses, coco_metrics + def _create_fixed_anchor( + self, anchor_box: List[float] + ) -> object_detector_writer.FixedAnchor: + """Helper function to create FixedAnchor objects from an anchor box array. + + Args: + anchor_box: List of anchor box coordinates in the format of [x_min, y_min, + x_max, y_max]. + + Returns: + A FixedAnchor object representing the anchor_box. + """ + image_shape = self._model_spec.input_image_shape[:2] + y_center_norm = (anchor_box[0] + anchor_box[2]) / (2 * image_shape[0]) + x_center_norm = (anchor_box[1] + anchor_box[3]) / (2 * image_shape[1]) + height_norm = (anchor_box[2] - anchor_box[0]) / image_shape[0] + width_norm = (anchor_box[3] - anchor_box[1]) / image_shape[1] + return object_detector_writer.FixedAnchor( + x_center_norm, y_center_norm, width_norm, height_norm + ) + def export_model( self, model_name: str = 'model.tflite', @@ -328,11 +350,40 @@ class ObjectDetector(classifier.Classifier): converter.target_spec.supported_ops = (tf.lite.OpsSet.TFLITE_BUILTINS,) tflite_model = converter.convert() - writer = object_detector_writer.MetadataWriter.create_for_models_with_nms( + # Build anchors + raw_anchor_boxes = self._preprocessor.anchor_boxes + anchors = [] + for _, anchor_boxes in raw_anchor_boxes.items(): + anchor_boxes_reshaped = anchor_boxes.numpy().reshape((-1, 4)) + for ab in anchor_boxes_reshaped: + anchors.append(self._create_fixed_anchor(ab)) + + ssd_anchors_options = object_detector_writer.SsdAnchorsOptions( + object_detector_writer.FixedAnchorsSchema(anchors) + ) + + tensor_decoding_options = object_detector_writer.TensorsDecodingOptions( + num_classes=self._num_classes, + num_boxes=len(anchors), + num_coords=4, + keypoint_coord_offset=0, + num_keypoints=0, + num_values_per_keypoint=2, + x_scale=1, + y_scale=1, + w_scale=1, + h_scale=1, + apply_exponential_on_box_size=True, + sigmoid_score=False, + ) + writer = object_detector_writer.MetadataWriter.create_for_models_without_nms( tflite_model, self._model_spec.mean_rgb, self._model_spec.stddev_rgb, labels=metadata_writer.Labels().add(list(self._label_names)), + ssd_anchors_options=ssd_anchors_options, + tensors_decoding_options=tensor_decoding_options, + output_tensors_order=metadata_info.RawDetectionOutputTensorsOrder.LOCATION_SCORE, ) tflite_model_with_metadata, metadata_json = writer.populate() model_util.save_tflite(tflite_model_with_metadata, tflite_file) diff --git a/mediapipe/model_maker/python/vision/object_detector/preprocessor.py b/mediapipe/model_maker/python/vision/object_detector/preprocessor.py index b4e08f997..ebea6a07b 100644 --- a/mediapipe/model_maker/python/vision/object_detector/preprocessor.py +++ b/mediapipe/model_maker/python/vision/object_detector/preprocessor.py @@ -44,6 +44,26 @@ class Preprocessor(object): self._aug_scale_max = 2.0 self._max_num_instances = 100 + self._padded_size = preprocess_ops.compute_padded_size( + self._output_size, 2**self._max_level + ) + + input_anchor = anchor.build_anchor_generator( + min_level=self._min_level, + max_level=self._max_level, + num_scales=self._num_scales, + aspect_ratios=self._aspect_ratios, + anchor_size=self._anchor_size, + ) + self._anchor_boxes = input_anchor(image_size=self._output_size) + self._anchor_labeler = anchor.AnchorLabeler( + self._match_threshold, self._unmatched_threshold + ) + + @property + def anchor_boxes(self): + return self._anchor_boxes + def __call__( self, data: Mapping[str, Any], is_training: bool = True ) -> Tuple[tf.Tensor, Mapping[str, Any]]: @@ -90,13 +110,10 @@ class Preprocessor(object): image, image_info = preprocess_ops.resize_and_crop_image( image, self._output_size, - padded_size=preprocess_ops.compute_padded_size( - self._output_size, 2**self._max_level - ), + padded_size=self._padded_size, aug_scale_min=(self._aug_scale_min if is_training else 1.0), aug_scale_max=(self._aug_scale_max if is_training else 1.0), ) - image_height, image_width, _ = image.get_shape().as_list() # Resize and crop boxes. image_scale = image_info[2, :] @@ -110,20 +127,9 @@ class Preprocessor(object): classes = tf.gather(classes, indices) # Assign anchors. - input_anchor = anchor.build_anchor_generator( - min_level=self._min_level, - max_level=self._max_level, - num_scales=self._num_scales, - aspect_ratios=self._aspect_ratios, - anchor_size=self._anchor_size, - ) - anchor_boxes = input_anchor(image_size=(image_height, image_width)) - anchor_labeler = anchor.AnchorLabeler( - self._match_threshold, self._unmatched_threshold - ) (cls_targets, box_targets, _, cls_weights, box_weights) = ( - anchor_labeler.label_anchors( - anchor_boxes, boxes, tf.expand_dims(classes, axis=1) + self._anchor_labeler.label_anchors( + self.anchor_boxes, boxes, tf.expand_dims(classes, axis=1) ) ) @@ -134,7 +140,7 @@ class Preprocessor(object): labels = { 'cls_targets': cls_targets, 'box_targets': box_targets, - 'anchor_boxes': anchor_boxes, + 'anchor_boxes': self.anchor_boxes, 'cls_weights': cls_weights, 'box_weights': box_weights, 'image_info': image_info, diff --git a/mediapipe/tasks/cc/vision/face_stylizer/face_stylizer_graph.cc b/mediapipe/tasks/cc/vision/face_stylizer/face_stylizer_graph.cc index cb49ef59d..d7265a146 100644 --- a/mediapipe/tasks/cc/vision/face_stylizer/face_stylizer_graph.cc +++ b/mediapipe/tasks/cc/vision/face_stylizer/face_stylizer_graph.cc @@ -361,9 +361,10 @@ class FaceStylizerGraph : public core::ModelTaskGraph { auto& tensors_to_image = graph.AddNode("mediapipe.tasks.TensorsToImageCalculator"); - ConfigureTensorsToImageCalculator( - image_to_tensor_options, - &tensors_to_image.GetOptions()); + auto& tensors_to_image_options = + tensors_to_image.GetOptions(); + tensors_to_image_options.mutable_input_tensor_float_range()->set_min(-1); + tensors_to_image_options.mutable_input_tensor_float_range()->set_max(1); face_alignment_image >> tensors_to_image.In(kTensorsTag); face_alignment = tensors_to_image.Out(kImageTag).Cast(); diff --git a/mediapipe/tasks/cc/vision/image_segmenter/BUILD b/mediapipe/tasks/cc/vision/image_segmenter/BUILD index 183b1bb86..fc977c0b5 100644 --- a/mediapipe/tasks/cc/vision/image_segmenter/BUILD +++ b/mediapipe/tasks/cc/vision/image_segmenter/BUILD @@ -63,6 +63,8 @@ cc_library( "//mediapipe/calculators/image:image_properties_calculator", "//mediapipe/calculators/image:image_transformation_calculator", "//mediapipe/calculators/image:image_transformation_calculator_cc_proto", + "//mediapipe/calculators/image:set_alpha_calculator", + "//mediapipe/calculators/image:set_alpha_calculator_cc_proto", "//mediapipe/calculators/tensor:image_to_tensor_calculator", "//mediapipe/calculators/tensor:image_to_tensor_calculator_cc_proto", "//mediapipe/calculators/tensor:inference_calculator", diff --git a/mediapipe/tasks/cc/vision/image_segmenter/calculators/segmentation_postprocessor_gl.cc b/mediapipe/tasks/cc/vision/image_segmenter/calculators/segmentation_postprocessor_gl.cc index 5b212069f..311f8d6aa 100644 --- a/mediapipe/tasks/cc/vision/image_segmenter/calculators/segmentation_postprocessor_gl.cc +++ b/mediapipe/tasks/cc/vision/image_segmenter/calculators/segmentation_postprocessor_gl.cc @@ -188,7 +188,7 @@ void main() { // Special argmax shader for N=1 classes. We don't need to worry about softmax // activation (it is assumed softmax requires N > 1 classes), but this should // occur after SIGMOID activation if specified. Instead of a true argmax, we -// simply use 0.5 as the cutoff, assigning 1 (foreground) or 0 (background) +// simply use 0.5 as the cutoff, assigning 0 (foreground) or 255 (background) // based on whether the confidence value reaches this cutoff or not, // respectively. static constexpr char kArgmaxOneClassShader[] = R"( @@ -199,12 +199,12 @@ uniform sampler2D input_texture; void main() { float input_val = texture2D(input_texture, sample_coordinate).x; // Category is just value rounded to nearest integer; then we map to either - // 0 or 1/255 accordingly. If the input has been activated properly, then the + // 0 or 1 accordingly. If the input has been activated properly, then the // values should always be in the range [0, 1]. But just in case it hasn't, to // avoid category overflow issues when the activation function is not properly // chosen, we add an extra clamp here, as performance hit is minimal. - float category = clamp(floor(input_val + 0.5), 0.0, 1.0); - gl_FragColor = vec4(category / 255.0, 0.0, 0.0, 1.0); + float category = clamp(floor(1.5 - input_val), 0.0, 1.0); + gl_FragColor = vec4(category, 0.0, 0.0, 1.0); })"; // Softmax is in 3 steps: diff --git a/mediapipe/tasks/cc/vision/image_segmenter/calculators/tensors_to_segmentation_calculator.cc b/mediapipe/tasks/cc/vision/image_segmenter/calculators/tensors_to_segmentation_calculator.cc index c2d1520dd..660dc59b7 100644 --- a/mediapipe/tasks/cc/vision/image_segmenter/calculators/tensors_to_segmentation_calculator.cc +++ b/mediapipe/tasks/cc/vision/image_segmenter/calculators/tensors_to_segmentation_calculator.cc @@ -61,6 +61,8 @@ using ::mediapipe::tasks::vision::GetImageLikeTensorShape; using ::mediapipe::tasks::vision::Shape; using ::mediapipe::tasks::vision::image_segmenter::proto::SegmenterOptions; +constexpr uint8_t kUnLabeledPixelValue = 255; + void StableSoftmax(absl::Span values, absl::Span activated_values) { float max_value = *std::max_element(values.begin(), values.end()); @@ -153,9 +155,11 @@ Image ProcessForCategoryMaskCpu(const Shape& input_shape, } if (input_channels == 1) { // if the input tensor is a single mask, it is assumed to be a binary - // foreground segmentation mask. For such a mask, we make foreground - // category 1, and background category 0. - pixel = static_cast(confidence_scores[0] > 0.5f); + // foreground segmentation mask. For such a mask, instead of a true + // argmax, we simply use 0.5 as the cutoff, assigning 0 (foreground) or + // 255 (background) based on whether the confidence value reaches this + // cutoff or not, respectively. + pixel = confidence_scores[0] > 0.5f ? 0 : kUnLabeledPixelValue; } else { const int maximum_category_idx = std::max_element(confidence_scores.begin(), confidence_scores.end()) - diff --git a/mediapipe/tasks/cc/vision/image_segmenter/image_segmenter_graph.cc b/mediapipe/tasks/cc/vision/image_segmenter/image_segmenter_graph.cc index a52d3fa9a..6ecfa3685 100644 --- a/mediapipe/tasks/cc/vision/image_segmenter/image_segmenter_graph.cc +++ b/mediapipe/tasks/cc/vision/image_segmenter/image_segmenter_graph.cc @@ -23,6 +23,7 @@ limitations under the License. #include "absl/strings/str_format.h" #include "mediapipe/calculators/image/image_clone_calculator.pb.h" #include "mediapipe/calculators/image/image_transformation_calculator.pb.h" +#include "mediapipe/calculators/image/set_alpha_calculator.pb.h" #include "mediapipe/calculators/tensor/tensor_converter_calculator.pb.h" #include "mediapipe/framework/api2/builder.h" #include "mediapipe/framework/api2/port.h" @@ -249,7 +250,8 @@ void ConfigureTensorConverterCalculator( // the tflite model. absl::StatusOr ConvertImageToTensors( Source image_in, Source norm_rect_in, bool use_gpu, - const core::ModelResources& model_resources, Graph& graph) { + bool is_hair_segmentation, const core::ModelResources& model_resources, + Graph& graph) { ASSIGN_OR_RETURN(const tflite::Tensor* tflite_input_tensor, GetInputTensor(model_resources)); if (tflite_input_tensor->shape()->size() != 4) { @@ -294,9 +296,17 @@ absl::StatusOr ConvertImageToTensors( // Convert from Image to legacy ImageFrame or GpuBuffer. auto& from_image = graph.AddNode("FromImageCalculator"); image_on_device >> from_image.In(kImageTag); - auto image_cpu_or_gpu = + Source image_cpu_or_gpu = from_image.Out(use_gpu ? kImageGpuTag : kImageCpuTag); + if (is_hair_segmentation) { + auto& set_alpha = graph.AddNode("SetAlphaCalculator"); + set_alpha.GetOptions() + .set_alpha_value(0); + image_cpu_or_gpu >> set_alpha.In(use_gpu ? kImageGpuTag : kImageTag); + image_cpu_or_gpu = set_alpha.Out(use_gpu ? kImageGpuTag : kImageTag); + } + // Resize the input image to the model input size. auto& image_transformation = graph.AddNode("ImageTransformationCalculator"); ConfigureImageTransformationCalculator( @@ -461,22 +471,41 @@ class ImageSegmenterGraph : public core::ModelTaskGraph { bool use_gpu = components::processors::DetermineImagePreprocessingGpuBackend( task_options.base_options().acceleration()); - ASSIGN_OR_RETURN(auto image_and_tensors, - ConvertImageToTensors(image_in, norm_rect_in, use_gpu, - model_resources, graph)); - // Adds inference subgraph and connects its input stream to the output - // tensors produced by the ImageToTensorCalculator. - auto& inference = AddInference( - model_resources, task_options.base_options().acceleration(), graph); - image_and_tensors.tensors >> inference.In(kTensorsTag); - // Adds segmentation calculators for output streams. + // Adds segmentation calculators for output streams. Add this calculator + // first to get the labels. auto& tensor_to_images = graph.AddNode("mediapipe.tasks.TensorsToSegmentationCalculator"); RET_CHECK_OK(ConfigureTensorsToSegmentationCalculator( task_options, model_resources, &tensor_to_images .GetOptions())); + const auto& tensor_to_images_options = + tensor_to_images.GetOptions(); + + // TODO: remove special logic for hair segmentation model. + // The alpha channel of hair segmentation model indicates the interested + // area. The model was designed for live stream mode, so that the mask of + // previous frame is used as the indicator for the next frame. For the first + // frame, it expects the alpha channel to be empty. To consolidate IMAGE, + // VIDEO and LIVE_STREAM mode in mediapipe tasks, here we forcely set the + // alpha channel to be empty if we find the model is the hair segmentation + // model. + bool is_hair_segmentation = false; + if (tensor_to_images_options.label_items_size() == 2 && + tensor_to_images_options.label_items().at(1).name() == "hair") { + is_hair_segmentation = true; + } + + ASSIGN_OR_RETURN( + auto image_and_tensors, + ConvertImageToTensors(image_in, norm_rect_in, use_gpu, + is_hair_segmentation, model_resources, graph)); + // Adds inference subgraph and connects its input stream to the output + // tensors produced by the ImageToTensorCalculator. + auto& inference = AddInference( + model_resources, task_options.base_options().acceleration(), graph); + image_and_tensors.tensors >> inference.In(kTensorsTag); inference.Out(kTensorsTag) >> tensor_to_images.In(kTensorsTag); // Adds image property calculator for output size. diff --git a/mediapipe/tasks/cc/vision/image_segmenter/image_segmenter_test.cc b/mediapipe/tasks/cc/vision/image_segmenter/image_segmenter_test.cc index 339ec1424..656ed0715 100644 --- a/mediapipe/tasks/cc/vision/image_segmenter/image_segmenter_test.cc +++ b/mediapipe/tasks/cc/vision/image_segmenter/image_segmenter_test.cc @@ -30,6 +30,7 @@ limitations under the License. #include "mediapipe/framework/port/opencv_imgcodecs_inc.h" #include "mediapipe/framework/port/opencv_imgproc_inc.h" #include "mediapipe/framework/port/status_matchers.h" +#include "mediapipe/framework/tool/test_util.h" #include "mediapipe/tasks/cc/components/containers/rect.h" #include "mediapipe/tasks/cc/core/base_options.h" #include "mediapipe/tasks/cc/core/proto/base_options.pb.h" @@ -425,6 +426,28 @@ TEST_F(ImageModeTest, SucceedsSelfie144x256Segmentations) { SimilarToFloatMask(expected_mask_float, kGoldenMaskSimilarity)); } +TEST_F(ImageModeTest, SucceedsSelfieSegmentationSingleLabel) { + auto options = std::make_unique(); + options->base_options.model_asset_path = + JoinPath("./", kTestDataDirectory, kSelfieSegmentation); + MP_ASSERT_OK_AND_ASSIGN(std::unique_ptr segmenter, + ImageSegmenter::Create(std::move(options))); + ASSERT_EQ(segmenter->GetLabels().size(), 1); + EXPECT_EQ(segmenter->GetLabels()[0], "selfie"); + MP_ASSERT_OK(segmenter->Close()); +} + +TEST_F(ImageModeTest, SucceedsSelfieSegmentationLandscapeSingleLabel) { + auto options = std::make_unique(); + options->base_options.model_asset_path = + JoinPath("./", kTestDataDirectory, kSelfieSegmentationLandscape); + MP_ASSERT_OK_AND_ASSIGN(std::unique_ptr segmenter, + ImageSegmenter::Create(std::move(options))); + ASSERT_EQ(segmenter->GetLabels().size(), 1); + EXPECT_EQ(segmenter->GetLabels()[0], "selfie"); + MP_ASSERT_OK(segmenter->Close()); +} + TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationConfidenceMask) { Image image = GetSRGBImage(JoinPath("./", kTestDataDirectory, "portrait.jpg")); @@ -464,6 +487,9 @@ TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationCategoryMask) { EXPECT_TRUE(result.category_mask.has_value()); MP_ASSERT_OK(segmenter->Close()); + MP_EXPECT_OK( + SavePngTestOutput(*result.category_mask->GetImageFrameSharedPtr(), + "portrait_selfie_segmentation_expected_category_mask")); cv::Mat selfie_mask = mediapipe::formats::MatView( result.category_mask->GetImageFrameSharedPtr().get()); cv::Mat expected_mask = cv::imread( @@ -471,7 +497,7 @@ TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationCategoryMask) { "portrait_selfie_segmentation_expected_category_mask.jpg"), cv::IMREAD_GRAYSCALE); EXPECT_THAT(selfie_mask, - SimilarToUint8Mask(expected_mask, kGoldenMaskSimilarity, 255)); + SimilarToUint8Mask(expected_mask, kGoldenMaskSimilarity, 1)); } TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationLandscapeCategoryMask) { @@ -487,6 +513,9 @@ TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationLandscapeCategoryMask) { EXPECT_TRUE(result.category_mask.has_value()); MP_ASSERT_OK(segmenter->Close()); + MP_EXPECT_OK(SavePngTestOutput( + *result.category_mask->GetImageFrameSharedPtr(), + "portrait_selfie_segmentation_landscape_expected_category_mask")); cv::Mat selfie_mask = mediapipe::formats::MatView( result.category_mask->GetImageFrameSharedPtr().get()); cv::Mat expected_mask = cv::imread( @@ -495,7 +524,7 @@ TEST_F(ImageModeTest, SucceedsPortraitSelfieSegmentationLandscapeCategoryMask) { "portrait_selfie_segmentation_landscape_expected_category_mask.jpg"), cv::IMREAD_GRAYSCALE); EXPECT_THAT(selfie_mask, - SimilarToUint8Mask(expected_mask, kGoldenMaskSimilarity, 255)); + SimilarToUint8Mask(expected_mask, kGoldenMaskSimilarity, 1)); } TEST_F(ImageModeTest, SucceedsHairSegmentation) { diff --git a/mediapipe/tasks/cc/vision/object_detector/object_detector.cc b/mediapipe/tasks/cc/vision/object_detector/object_detector.cc index 01fd3eb7b..152ee3273 100644 --- a/mediapipe/tasks/cc/vision/object_detector/object_detector.cc +++ b/mediapipe/tasks/cc/vision/object_detector/object_detector.cc @@ -129,9 +129,17 @@ absl::StatusOr> ObjectDetector::Create( if (status_or_packets.value()[kImageOutStreamName].IsEmpty()) { return; } + Packet image_packet = status_or_packets.value()[kImageOutStreamName]; Packet detections_packet = status_or_packets.value()[kDetectionsOutStreamName]; - Packet image_packet = status_or_packets.value()[kImageOutStreamName]; + if (detections_packet.IsEmpty()) { + Packet empty_packet = + status_or_packets.value()[kDetectionsOutStreamName]; + result_callback( + {ConvertToDetectionResult({})}, image_packet.Get(), + empty_packet.Timestamp().Value() / kMicroSecondsPerMilliSecond); + return; + } result_callback(ConvertToDetectionResult( detections_packet.Get>()), image_packet.Get(), @@ -165,6 +173,9 @@ absl::StatusOr ObjectDetector::Detect( ProcessImageData( {{kImageInStreamName, MakePacket(std::move(image))}, {kNormRectName, MakePacket(std::move(norm_rect))}})); + if (output_packets[kDetectionsOutStreamName].IsEmpty()) { + return {ConvertToDetectionResult({})}; + } return ConvertToDetectionResult( output_packets[kDetectionsOutStreamName].Get>()); } @@ -190,6 +201,9 @@ absl::StatusOr ObjectDetector::DetectForVideo( {kNormRectName, MakePacket(std::move(norm_rect)) .At(Timestamp(timestamp_ms * kMicroSecondsPerMilliSecond))}})); + if (output_packets[kDetectionsOutStreamName].IsEmpty()) { + return {ConvertToDetectionResult({})}; + } return ConvertToDetectionResult( output_packets[kDetectionsOutStreamName].Get>()); } diff --git a/mediapipe/tasks/cc/vision/object_detector/object_detector_test.cc b/mediapipe/tasks/cc/vision/object_detector/object_detector_test.cc index 8642af7c4..e66fc19bb 100644 --- a/mediapipe/tasks/cc/vision/object_detector/object_detector_test.cc +++ b/mediapipe/tasks/cc/vision/object_detector/object_detector_test.cc @@ -499,6 +499,22 @@ TEST_F(ImageModeTest, SucceedsEfficientDetNoNmsModel) { })pb")})); } +TEST_F(ImageModeTest, SucceedsNoObjectDetected) { + MP_ASSERT_OK_AND_ASSIGN(Image image, + DecodeImageFromFile(JoinPath("./", kTestDataDirectory, + "cats_and_dogs.jpg"))); + auto options = std::make_unique(); + options->max_results = 4; + options->score_threshold = 1.0f; + options->base_options.model_asset_path = + JoinPath("./", kTestDataDirectory, kEfficientDetWithoutNms); + MP_ASSERT_OK_AND_ASSIGN(std::unique_ptr object_detector, + ObjectDetector::Create(std::move(options))); + MP_ASSERT_OK_AND_ASSIGN(auto results, object_detector->Detect(image)); + MP_ASSERT_OK(object_detector->Close()); + EXPECT_THAT(results.detections, testing::IsEmpty()); +} + TEST_F(ImageModeTest, SucceedsWithoutImageResizing) { MP_ASSERT_OK_AND_ASSIGN(Image image, DecodeImageFromFile(JoinPath( "./", kTestDataDirectory, diff --git a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker.cc b/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker.cc index f421c7376..797e71488 100644 --- a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker.cc +++ b/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker.cc @@ -63,8 +63,6 @@ constexpr char kNormLandmarksTag[] = "NORM_LANDMARKS"; constexpr char kNormLandmarksStreamName[] = "norm_landmarks"; constexpr char kPoseWorldLandmarksTag[] = "WORLD_LANDMARKS"; constexpr char kPoseWorldLandmarksStreamName[] = "world_landmarks"; -constexpr char kPoseAuxiliaryLandmarksTag[] = "AUXILIARY_LANDMARKS"; -constexpr char kPoseAuxiliaryLandmarksStreamName[] = "auxiliary_landmarks"; constexpr int kMicroSecondsPerMilliSecond = 1000; // Creates a MediaPipe graph config that contains a subgraph node of @@ -83,9 +81,6 @@ CalculatorGraphConfig CreateGraphConfig( graph.Out(kNormLandmarksTag); subgraph.Out(kPoseWorldLandmarksTag).SetName(kPoseWorldLandmarksStreamName) >> graph.Out(kPoseWorldLandmarksTag); - subgraph.Out(kPoseAuxiliaryLandmarksTag) - .SetName(kPoseAuxiliaryLandmarksStreamName) >> - graph.Out(kPoseAuxiliaryLandmarksTag); subgraph.Out(kImageTag).SetName(kImageOutStreamName) >> graph.Out(kImageTag); if (output_segmentation_masks) { subgraph.Out(kSegmentationMaskTag).SetName(kSegmentationMaskStreamName) >> @@ -163,8 +158,6 @@ absl::StatusOr> PoseLandmarker::Create( status_or_packets.value()[kNormLandmarksStreamName]; Packet pose_world_landmarks_packet = status_or_packets.value()[kPoseWorldLandmarksStreamName]; - Packet pose_auxiliary_landmarks_packet = - status_or_packets.value()[kPoseAuxiliaryLandmarksStreamName]; std::optional> segmentation_mask = std::nullopt; if (output_segmentation_masks) { segmentation_mask = segmentation_mask_packet.Get>(); @@ -175,9 +168,7 @@ absl::StatusOr> PoseLandmarker::Create( /* pose_landmarks= */ pose_landmarks_packet.Get>(), /* pose_world_landmarks= */ - pose_world_landmarks_packet.Get>(), - pose_auxiliary_landmarks_packet - .Get>()), + pose_world_landmarks_packet.Get>()), image_packet.Get(), pose_landmarks_packet.Timestamp().Value() / kMicroSecondsPerMilliSecond); @@ -234,10 +225,7 @@ absl::StatusOr PoseLandmarker::Detect( .Get>(), /* pose_world_landmarks */ output_packets[kPoseWorldLandmarksStreamName] - .Get>(), - /*pose_auxiliary_landmarks= */ - output_packets[kPoseAuxiliaryLandmarksStreamName] - .Get>()); + .Get>()); } absl::StatusOr PoseLandmarker::DetectForVideo( @@ -277,10 +265,7 @@ absl::StatusOr PoseLandmarker::DetectForVideo( .Get>(), /* pose_world_landmarks */ output_packets[kPoseWorldLandmarksStreamName] - .Get>(), - /* pose_auxiliary_landmarks= */ - output_packets[kPoseAuxiliaryLandmarksStreamName] - .Get>()); + .Get>()); } absl::Status PoseLandmarker::DetectAsync( diff --git a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result.cc b/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result.cc index 77f374d1e..da4c630b3 100644 --- a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result.cc +++ b/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result.cc @@ -27,15 +27,12 @@ namespace pose_landmarker { PoseLandmarkerResult ConvertToPoseLandmarkerResult( std::optional> segmentation_masks, const std::vector& pose_landmarks_proto, - const std::vector& pose_world_landmarks_proto, - const std::vector& - pose_auxiliary_landmarks_proto) { + const std::vector& pose_world_landmarks_proto) { PoseLandmarkerResult result; result.segmentation_masks = segmentation_masks; result.pose_landmarks.resize(pose_landmarks_proto.size()); result.pose_world_landmarks.resize(pose_world_landmarks_proto.size()); - result.pose_auxiliary_landmarks.resize(pose_auxiliary_landmarks_proto.size()); std::transform(pose_landmarks_proto.begin(), pose_landmarks_proto.end(), result.pose_landmarks.begin(), components::containers::ConvertToNormalizedLandmarks); @@ -43,10 +40,6 @@ PoseLandmarkerResult ConvertToPoseLandmarkerResult( pose_world_landmarks_proto.end(), result.pose_world_landmarks.begin(), components::containers::ConvertToLandmarks); - std::transform(pose_auxiliary_landmarks_proto.begin(), - pose_auxiliary_landmarks_proto.end(), - result.pose_auxiliary_landmarks.begin(), - components::containers::ConvertToNormalizedLandmarks); return result; } diff --git a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result.h b/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result.h index f45994837..8978e5147 100644 --- a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result.h +++ b/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result.h @@ -37,17 +37,12 @@ struct PoseLandmarkerResult { std::vector pose_landmarks; // Detected pose landmarks in world coordinates. std::vector pose_world_landmarks; - // Detected auxiliary landmarks, used for deriving ROI for next frame. - std::vector - pose_auxiliary_landmarks; }; PoseLandmarkerResult ConvertToPoseLandmarkerResult( std::optional> segmentation_mask, const std::vector& pose_landmarks_proto, - const std::vector& pose_world_landmarks_proto, - const std::vector& - pose_auxiliary_landmarks_proto); + const std::vector& pose_world_landmarks_proto); } // namespace pose_landmarker } // namespace vision diff --git a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result_test.cc b/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result_test.cc index 14916215c..05e83b655 100644 --- a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result_test.cc +++ b/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_result_test.cc @@ -47,13 +47,6 @@ TEST(ConvertFromProto, Succeeds) { landmark_proto.set_y(5.2); landmark_proto.set_z(4.3); - mediapipe::NormalizedLandmarkList auxiliary_landmark_list_proto; - mediapipe::NormalizedLandmark& auxiliary_landmark_proto = - *auxiliary_landmark_list_proto.add_landmark(); - auxiliary_landmark_proto.set_x(0.5); - auxiliary_landmark_proto.set_y(0.5); - auxiliary_landmark_proto.set_z(0.5); - std::vector segmentation_masks_lists = {segmentation_mask}; std::vector normalized_landmarks_lists = { @@ -62,12 +55,9 @@ TEST(ConvertFromProto, Succeeds) { std::vector world_landmarks_lists = { world_landmark_list_proto}; - std::vector auxiliary_landmarks_lists = { - auxiliary_landmark_list_proto}; - PoseLandmarkerResult pose_landmarker_result = ConvertToPoseLandmarkerResult( segmentation_masks_lists, normalized_landmarks_lists, - world_landmarks_lists, auxiliary_landmarks_lists); + world_landmarks_lists); EXPECT_EQ(pose_landmarker_result.pose_landmarks.size(), 1); EXPECT_EQ(pose_landmarker_result.pose_landmarks[0].landmarks.size(), 1); @@ -82,14 +72,6 @@ TEST(ConvertFromProto, Succeeds) { testing::FieldsAre(testing::FloatEq(3.1), testing::FloatEq(5.2), testing::FloatEq(4.3), std::nullopt, std::nullopt, std::nullopt)); - - EXPECT_EQ(pose_landmarker_result.pose_auxiliary_landmarks.size(), 1); - EXPECT_EQ(pose_landmarker_result.pose_auxiliary_landmarks[0].landmarks.size(), - 1); - EXPECT_THAT(pose_landmarker_result.pose_auxiliary_landmarks[0].landmarks[0], - testing::FieldsAre(testing::FloatEq(0.5), testing::FloatEq(0.5), - testing::FloatEq(0.5), std::nullopt, - std::nullopt, std::nullopt)); } } // namespace pose_landmarker diff --git a/mediapipe/tasks/ios/common/utils/sources/NSString+Helpers.mm b/mediapipe/tasks/ios/common/utils/sources/NSString+Helpers.mm index dfc7749be..5f484fce5 100644 --- a/mediapipe/tasks/ios/common/utils/sources/NSString+Helpers.mm +++ b/mediapipe/tasks/ios/common/utils/sources/NSString+Helpers.mm @@ -24,7 +24,7 @@ return [NSString stringWithCString:text.c_str() encoding:[NSString defaultCStringEncoding]]; } -+ (NSString *)uuidString{ ++ (NSString *)uuidString { return [[NSUUID UUID] UUIDString]; } diff --git a/mediapipe/tasks/ios/components/containers/sources/MPPDetection.m b/mediapipe/tasks/ios/components/containers/sources/MPPDetection.m index c245478db..c61cf0b39 100644 --- a/mediapipe/tasks/ios/components/containers/sources/MPPDetection.m +++ b/mediapipe/tasks/ios/components/containers/sources/MPPDetection.m @@ -28,7 +28,12 @@ return self; } -// TODO: Implement hash +- (NSUInteger)hash { + NSUInteger nonNullPropertiesHash = + @(self.location.x).hash ^ @(self.location.y).hash ^ @(self.score).hash; + + return self.label ? nonNullPropertiesHash ^ self.label.hash : nonNullPropertiesHash; +} - (BOOL)isEqual:(nullable id)object { if (!object) { diff --git a/mediapipe/tasks/ios/test/vision/image_classifier/MPPImageClassifierTests.m b/mediapipe/tasks/ios/test/vision/image_classifier/MPPImageClassifierTests.m index 7eb93df8e..8db71a11b 100644 --- a/mediapipe/tasks/ios/test/vision/image_classifier/MPPImageClassifierTests.m +++ b/mediapipe/tasks/ios/test/vision/image_classifier/MPPImageClassifierTests.m @@ -452,13 +452,14 @@ static NSString *const kLiveStreamTestsDictExpectationKey = @"expectation"; [self assertCreateImageClassifierWithOptions:options failsWithExpectedError: - [NSError errorWithDomain:kExpectedErrorDomain - code:MPPTasksErrorCodeInvalidArgumentError - userInfo:@{ - NSLocalizedDescriptionKey : - @"The vision task is in image or video mode. The " - @"delegate must not be set in the task's options." - }]]; + [NSError + errorWithDomain:kExpectedErrorDomain + code:MPPTasksErrorCodeInvalidArgumentError + userInfo:@{ + NSLocalizedDescriptionKey : + @"The vision task is in image or video mode. The " + @"delegate must not be set in the task's options." + }]]; } } @@ -469,15 +470,15 @@ static NSString *const kLiveStreamTestsDictExpectationKey = @"expectation"; [self assertCreateImageClassifierWithOptions:options failsWithExpectedError: - [NSError - errorWithDomain:kExpectedErrorDomain - code:MPPTasksErrorCodeInvalidArgumentError - userInfo:@{ - NSLocalizedDescriptionKey : - @"The vision task is in live stream mode. An object " - @"must be set as the delegate of the task in its " - @"options to ensure asynchronous delivery of results." - }]]; + [NSError errorWithDomain:kExpectedErrorDomain + code:MPPTasksErrorCodeInvalidArgumentError + userInfo:@{ + NSLocalizedDescriptionKey : + @"The vision task is in live stream mode. An " + @"object must be set as the delegate of the task " + @"in its options to ensure asynchronous delivery " + @"of results." + }]]; } - (void)testClassifyFailsWithCallingWrongApiInImageMode { diff --git a/mediapipe/tasks/ios/test/vision/object_detector/MPPObjectDetectorTests.m b/mediapipe/tasks/ios/test/vision/object_detector/MPPObjectDetectorTests.m index d3b81703b..700df65a5 100644 --- a/mediapipe/tasks/ios/test/vision/object_detector/MPPObjectDetectorTests.m +++ b/mediapipe/tasks/ios/test/vision/object_detector/MPPObjectDetectorTests.m @@ -25,6 +25,8 @@ static NSDictionary *const kCatsAndDogsRotatedImage = static NSString *const kExpectedErrorDomain = @"com.google.mediapipe.tasks"; static const float pixelDifferenceTolerance = 10.0f; static const float scoreDifferenceTolerance = 0.02f; +static NSString *const kLiveStreamTestsDictObjectDetectorKey = @"object_detector"; +static NSString *const kLiveStreamTestsDictExpectationKey = @"expectation"; #define AssertEqualErrors(error, expectedError) \ XCTAssertNotNil(error); \ @@ -58,7 +60,10 @@ static const float scoreDifferenceTolerance = 0.02f; XCTAssertEqualWithAccuracy(boundingBox.size.height, expectedBoundingBox.size.height, \ pixelDifferenceTolerance, @"index i = %d", idx); -@interface MPPObjectDetectorTests : XCTestCase +@interface MPPObjectDetectorTests : XCTestCase { + NSDictionary *liveStreamSucceedsTestDict; + NSDictionary *outOfOrderTimestampTestDict; +} @end @implementation MPPObjectDetectorTests @@ -446,31 +451,28 @@ static const float scoreDifferenceTolerance = 0.02f; #pragma mark Running Mode Tests -- (void)testCreateObjectDetectorFailsWithResultListenerInNonLiveStreamMode { +- (void)testCreateObjectDetectorFailsWithDelegateInNonLiveStreamMode { MPPRunningMode runningModesToTest[] = {MPPRunningModeImage, MPPRunningModeVideo}; for (int i = 0; i < sizeof(runningModesToTest) / sizeof(runningModesToTest[0]); i++) { MPPObjectDetectorOptions *options = [self objectDetectorOptionsWithModelName:kModelName]; options.runningMode = runningModesToTest[i]; - options.completion = - ^(MPPObjectDetectionResult *result, NSInteger timestampInMilliseconds, NSError *error) { - }; + options.objectDetectorLiveStreamDelegate = self; [self assertCreateObjectDetectorWithOptions:options failsWithExpectedError: - [NSError - errorWithDomain:kExpectedErrorDomain - code:MPPTasksErrorCodeInvalidArgumentError - userInfo:@{ - NSLocalizedDescriptionKey : - @"The vision task is in image or video mode, a " - @"user-defined result callback should not be provided." - }]]; + [NSError errorWithDomain:kExpectedErrorDomain + code:MPPTasksErrorCodeInvalidArgumentError + userInfo:@{ + NSLocalizedDescriptionKey : + @"The vision task is in image or video mode. The " + @"delegate must not be set in the task's options." + }]]; } } -- (void)testCreateObjectDetectorFailsWithMissingResultListenerInLiveStreamMode { +- (void)testCreateObjectDetectorFailsWithMissingDelegateInLiveStreamMode { MPPObjectDetectorOptions *options = [self objectDetectorOptionsWithModelName:kModelName]; options.runningMode = MPPRunningModeLiveStream; @@ -481,8 +483,10 @@ static const float scoreDifferenceTolerance = 0.02f; code:MPPTasksErrorCodeInvalidArgumentError userInfo:@{ NSLocalizedDescriptionKey : - @"The vision task is in live stream mode, a " - @"user-defined result callback must be provided." + @"The vision task is in live stream mode. An " + @"object must be set as the delegate of the task " + @"in its options to ensure asynchronous delivery " + @"of results." }]]; } @@ -563,10 +567,7 @@ static const float scoreDifferenceTolerance = 0.02f; MPPObjectDetectorOptions *options = [self objectDetectorOptionsWithModelName:kModelName]; options.runningMode = MPPRunningModeLiveStream; - options.completion = - ^(MPPObjectDetectionResult *result, NSInteger timestampInMilliseconds, NSError *error) { - - }; + options.objectDetectorLiveStreamDelegate = self; MPPObjectDetector *objectDetector = [self objectDetectorWithOptionsSucceeds:options]; @@ -631,23 +632,17 @@ static const float scoreDifferenceTolerance = 0.02f; options.maxResults = maxResults; options.runningMode = MPPRunningModeLiveStream; + options.objectDetectorLiveStreamDelegate = self; XCTestExpectation *expectation = [[XCTestExpectation alloc] initWithDescription:@"detectWithOutOfOrderTimestampsAndLiveStream"]; expectation.expectedFulfillmentCount = 1; - options.completion = - ^(MPPObjectDetectionResult *result, NSInteger timestampInMilliseconds, NSError *error) { - [self assertObjectDetectionResult:result - isEqualToExpectedResult: - [MPPObjectDetectorTests - expectedDetectionResultForCatsAndDogsImageWithTimestampInMilliseconds: - timestampInMilliseconds] - expectedDetectionsCount:maxResults]; - [expectation fulfill]; - }; - MPPObjectDetector *objectDetector = [self objectDetectorWithOptionsSucceeds:options]; + liveStreamSucceedsTestDict = @{ + kLiveStreamTestsDictObjectDetectorKey : objectDetector, + kLiveStreamTestsDictExpectationKey : expectation + }; MPPImage *image = [self imageWithFileInfo:kCatsAndDogsImage]; @@ -695,19 +690,15 @@ static const float scoreDifferenceTolerance = 0.02f; expectation.expectedFulfillmentCount = iterationCount + 1; expectation.inverted = YES; - options.completion = - ^(MPPObjectDetectionResult *result, NSInteger timestampInMilliseconds, NSError *error) { - [self assertObjectDetectionResult:result - isEqualToExpectedResult: - [MPPObjectDetectorTests - expectedDetectionResultForCatsAndDogsImageWithTimestampInMilliseconds: - timestampInMilliseconds] - expectedDetectionsCount:maxResults]; - [expectation fulfill]; - }; + options.objectDetectorLiveStreamDelegate = self; MPPObjectDetector *objectDetector = [self objectDetectorWithOptionsSucceeds:options]; + liveStreamSucceedsTestDict = @{ + kLiveStreamTestsDictObjectDetectorKey : objectDetector, + kLiveStreamTestsDictExpectationKey : expectation + }; + // TODO: Mimic initialization from CMSampleBuffer as live stream mode is most likely to be used // with the iOS camera. AVCaptureVideoDataOutput sample buffer delegates provide frames of type // `CMSampleBuffer`. @@ -721,4 +712,24 @@ static const float scoreDifferenceTolerance = 0.02f; [self waitForExpectations:@[ expectation ] timeout:timeout]; } +#pragma mark MPPObjectDetectorLiveStreamDelegate Methods +- (void)objectDetector:(MPPObjectDetector *)objectDetector + didFinishDetectionWithResult:(MPPObjectDetectionResult *)objectDetectionResult + timestampInMilliseconds:(NSInteger)timestampInMilliseconds + error:(NSError *)error { + NSInteger maxResults = 4; + [self assertObjectDetectionResult:objectDetectionResult + isEqualToExpectedResult: + [MPPObjectDetectorTests + expectedDetectionResultForCatsAndDogsImageWithTimestampInMilliseconds: + timestampInMilliseconds] + expectedDetectionsCount:maxResults]; + + if (objectDetector == outOfOrderTimestampTestDict[kLiveStreamTestsDictObjectDetectorKey]) { + [outOfOrderTimestampTestDict[kLiveStreamTestsDictExpectationKey] fulfill]; + } else if (objectDetector == liveStreamSucceedsTestDict[kLiveStreamTestsDictObjectDetectorKey]) { + [liveStreamSucceedsTestDict[kLiveStreamTestsDictExpectationKey] fulfill]; + } +} + @end diff --git a/mediapipe/tasks/ios/vision/core/BUILD b/mediapipe/tasks/ios/vision/core/BUILD index 328d9e892..fe0fba0ef 100644 --- a/mediapipe/tasks/ios/vision/core/BUILD +++ b/mediapipe/tasks/ios/vision/core/BUILD @@ -63,5 +63,10 @@ objc_library( "//third_party/apple_frameworks:UIKit", "@com_google_absl//absl/status:statusor", "@ios_opencv//:OpencvFramework", - ], + ] + select({ + "@//third_party:opencv_ios_sim_arm64_source_build": ["@ios_opencv_source//:opencv_xcframework"], + "@//third_party:opencv_ios_sim_fat_source_build": ["@ios_opencv_source//:opencv_xcframework"], + "@//third_party:opencv_ios_arm64_source_build": ["@ios_opencv_source//:opencv_xcframework"], + "//conditions:default": [], + }), ) diff --git a/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetector.h b/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetector.h index 4443f56d1..ae18bf58d 100644 --- a/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetector.h +++ b/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetector.h @@ -96,6 +96,15 @@ NS_SWIFT_NAME(ObjectDetector) * `MPPImage`. Only use this method when the `MPPObjectDetector` is created with * `MPPRunningModeImage`. * + * This method supports classification of RGBA images. If your `MPPImage` has a source type of + * `MPPImageSourceTypePixelBuffer` or `MPPImageSourceTypeSampleBuffer`, the underlying pixel buffer + * must have one of the following pixel format types: + * 1. kCVPixelFormatType_32BGRA + * 2. kCVPixelFormatType_32RGBA + * + * If your `MPPImage` has a source type of `MPPImageSourceTypeImage` ensure that the color space is + * RGB with an Alpha channel. + * * @param image The `MPPImage` on which object detection is to be performed. * @param error An optional error parameter populated when there is an error in performing object * detection on the input image. @@ -115,6 +124,15 @@ NS_SWIFT_NAME(ObjectDetector) * the provided `MPPImage`. Only use this method when the `MPPObjectDetector` is created with * `MPPRunningModeVideo`. * + * This method supports classification of RGBA images. If your `MPPImage` has a source type of + * `MPPImageSourceTypePixelBuffer` or `MPPImageSourceTypeSampleBuffer`, the underlying pixel buffer + * must have one of the following pixel format types: + * 1. kCVPixelFormatType_32BGRA + * 2. kCVPixelFormatType_32RGBA + * + * If your `MPPImage` has a source type of `MPPImageSourceTypeImage` ensure that the color space is + * RGB with an Alpha channel. + * * @param image The `MPPImage` on which object detection is to be performed. * @param timestampInMilliseconds The video frame's timestamp (in milliseconds). The input * timestamps must be monotonically increasing. @@ -135,12 +153,28 @@ NS_SWIFT_NAME(ObjectDetector) * Sends live stream image data of type `MPPImage` to perform object detection using the whole * image as region of interest. Rotation will be applied according to the `orientation` property of * the provided `MPPImage`. Only use this method when the `MPPObjectDetector` is created with - * `MPPRunningModeLiveStream`. Results are provided asynchronously via the `completion` callback - * provided in the `MPPObjectDetectorOptions`. + * `MPPRunningModeLiveStream`. + * + * The object which needs to be continuously notified of the available results of object + * detection must confirm to `MPPObjectDetectorLiveStreamDelegate` protocol and implement the + * `objectDetector:didFinishDetectionWithResult:timestampInMilliseconds:error:` delegate method. * * It's required to provide a timestamp (in milliseconds) to indicate when the input image is sent * to the object detector. The input timestamps must be monotonically increasing. * + * This method supports classification of RGBA images. If your `MPPImage` has a source type of + * `MPPImageSourceTypePixelBuffer` or `MPPImageSourceTypeSampleBuffer`, the underlying pixel buffer + * must have one of the following pixel format types: + * 1. kCVPixelFormatType_32BGRA + * 2. kCVPixelFormatType_32RGBA + * + * If the input `MPPImage` has a source type of `MPPImageSourceTypeImage` ensure that the color + * space is RGB with an Alpha channel. + * + * If this method is used for classifying live camera frames using `AVFoundation`, ensure that you + * request `AVCaptureVideoDataOutput` to output frames in `kCMPixelFormat_32RGBA` using its + * `videoSettings` property. + * * @param image A live stream image data of type `MPPImage` on which object detection is to be * performed. * @param timestampInMilliseconds The timestamp (in milliseconds) which indicates when the input diff --git a/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetector.mm b/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetector.mm index f0914cdb1..a5b4077be 100644 --- a/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetector.mm +++ b/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetector.mm @@ -37,8 +37,8 @@ static NSString *const kImageOutStreamName = @"image_out"; static NSString *const kImageTag = @"IMAGE"; static NSString *const kNormRectStreamName = @"norm_rect_in"; static NSString *const kNormRectTag = @"NORM_RECT"; - static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorGraph"; +static NSString *const kTaskName = @"objectDetector"; #define InputPacketMap(imagePacket, normalizedRectPacket) \ { \ @@ -51,6 +51,7 @@ static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorG /** iOS Vision Task Runner */ MPPVisionTaskRunner *_visionTaskRunner; } +@property(nonatomic, weak) id objectDetectorLiveStreamDelegate; @end @implementation MPPObjectDetector @@ -78,11 +79,37 @@ static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorG PacketsCallback packetsCallback = nullptr; - if (options.completion) { + if (options.objectDetectorLiveStreamDelegate) { + _objectDetectorLiveStreamDelegate = options.objectDetectorLiveStreamDelegate; + + // Capturing `self` as weak in order to avoid `self` being kept in memory + // and cause a retain cycle, after self is set to `nil`. + MPPObjectDetector *__weak weakSelf = self; + + // Create a private serial dispatch queue in which the delegate method will be called + // asynchronously. This is to ensure that if the client performs a long running operation in + // the delegate method, the queue on which the C++ callbacks is invoked is not blocked and is + // freed up to continue with its operations. + dispatch_queue_t callbackQueue = dispatch_queue_create( + [MPPVisionTaskRunner uniqueDispatchQueueNameWithSuffix:kTaskName], NULL); packetsCallback = [=](absl::StatusOr statusOrPackets) { + if (!weakSelf) { + return; + } + if (![weakSelf.objectDetectorLiveStreamDelegate + respondsToSelector:@selector + (objectDetector:didFinishDetectionWithResult:timestampInMilliseconds:error:)]) { + return; + } + NSError *callbackError = nil; if (![MPPCommonUtils checkCppError:statusOrPackets.status() toError:&callbackError]) { - options.completion(nil, Timestamp::Unset().Value(), callbackError); + dispatch_async(callbackQueue, ^{ + [weakSelf.objectDetectorLiveStreamDelegate objectDetector:weakSelf + didFinishDetectionWithResult:nil + timestampInMilliseconds:Timestamp::Unset().Value() + error:callbackError]; + }); return; } @@ -95,10 +122,15 @@ static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorG objectDetectionResultWithDetectionsPacket:statusOrPackets.value()[kDetectionsStreamName .cppString]]; - options.completion(result, - outputPacketMap[kImageOutStreamName.cppString].Timestamp().Value() / - kMicroSecondsPerMilliSecond, - callbackError); + NSInteger timeStampInMilliseconds = + outputPacketMap[kImageOutStreamName.cppString].Timestamp().Value() / + kMicroSecondsPerMilliSecond; + dispatch_async(callbackQueue, ^{ + [weakSelf.objectDetectorLiveStreamDelegate objectDetector:weakSelf + didFinishDetectionWithResult:result + timestampInMilliseconds:timeStampInMilliseconds + error:callbackError]; + }); }; } @@ -112,6 +144,7 @@ static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorG return nil; } } + return self; } @@ -224,5 +257,4 @@ static NSString *const kTaskGraphName = @"mediapipe.tasks.vision.ObjectDetectorG return [_visionTaskRunner processLiveStreamPacketMap:inputPacketMap.value() error:error]; } - @end diff --git a/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetectorOptions.h b/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetectorOptions.h index 79bc9baa6..c91e170c9 100644 --- a/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetectorOptions.h +++ b/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetectorOptions.h @@ -20,19 +20,70 @@ NS_ASSUME_NONNULL_BEGIN +@class MPPObjectDetector; + +/** + * This protocol defines an interface for the delegates of `MPPObjectDetector` object to receive + * results of performing asynchronous object detection on images (i.e, when `runningMode` = + * `MPPRunningModeLiveStream`). + * + * The delegate of `MPPObjectDetector` must adopt `MPPObjectDetectorLiveStreamDelegate` protocol. + * The methods in this protocol are optional. + */ +NS_SWIFT_NAME(ObjectDetectorLiveStreamDelegate) +@protocol MPPObjectDetectorLiveStreamDelegate + +@optional + +/** + * This method notifies a delegate that the results of asynchronous object detection of + * an image submitted to the `MPPObjectDetector` is available. + * + * This method is called on a private serial dispatch queue created by the `MPPObjectDetector` + * for performing the asynchronous delegates calls. + * + * @param objectDetector The object detector which performed the object detection. + * This is useful to test equality when there are multiple instances of `MPPObjectDetector`. + * @param result The `MPPObjectDetectionResult` object that contains a list of detections, each + * detection has a bounding box that is expressed in the unrotated input frame of reference + * coordinates system, i.e. in `[0,image_width) x [0,image_height)`, which are the dimensions of the + * underlying image data. + * @param timestampInMilliseconds The timestamp (in milliseconds) which indicates when the input + * image was sent to the object detector. + * @param error An optional error parameter populated when there is an error in performing object + * detection on the input live stream image data. + * + */ +- (void)objectDetector:(MPPObjectDetector *)objectDetector + didFinishDetectionWithResult:(nullable MPPObjectDetectionResult *)result + timestampInMilliseconds:(NSInteger)timestampInMilliseconds + error:(nullable NSError *)error + NS_SWIFT_NAME(objectDetector(_:didFinishDetection:timestampInMilliseconds:error:)); +@end + /** Options for setting up a `MPPObjectDetector`. */ NS_SWIFT_NAME(ObjectDetectorOptions) @interface MPPObjectDetectorOptions : MPPTaskOptions +/** + * Running mode of the object detector task. Defaults to `MPPRunningModeImage`. + * `MPPImageClassifier` can be created with one of the following running modes: + * 1. `MPPRunningModeImage`: The mode for performing object detection on single image inputs. + * 2. `MPPRunningModeVideo`: The mode for performing object detection on the decoded frames of a + * video. + * 3. `MPPRunningModeLiveStream`: The mode for performing object detection on a live stream of + * input data, such as from the camera. + */ @property(nonatomic) MPPRunningMode runningMode; /** - * The user-defined result callback for processing live stream data. The result callback should only - * be specified when the running mode is set to the live stream mode. - * TODO: Add parameter `MPPImage` in the callback. + * An object that confirms to `MPPObjectDetectorLiveStreamDelegate` protocol. This object must + * implement `objectDetector:didFinishDetectionWithResult:timestampInMilliseconds:error:` to receive + * the results of performing asynchronous object detection on images (i.e, when `runningMode` = + * `MPPRunningModeLiveStream`). */ -@property(nonatomic, copy) void (^completion) - (MPPObjectDetectionResult *__nullable result, NSInteger timestampMs, NSError *error); +@property(nonatomic, weak, nullable) id + objectDetectorLiveStreamDelegate; /** * The locale to use for display names specified through the TFLite Model Metadata, if any. Defaults diff --git a/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetectorOptions.m b/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetectorOptions.m index 73f8ce5b5..b93a6b30b 100644 --- a/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetectorOptions.m +++ b/mediapipe/tasks/ios/vision/object_detector/sources/MPPObjectDetectorOptions.m @@ -33,7 +33,7 @@ objectDetectorOptions.categoryDenylist = self.categoryDenylist; objectDetectorOptions.categoryAllowlist = self.categoryAllowlist; objectDetectorOptions.displayNamesLocale = self.displayNamesLocale; - objectDetectorOptions.completion = self.completion; + objectDetectorOptions.objectDetectorLiveStreamDelegate = self.objectDetectorLiveStreamDelegate; return objectDetectorOptions; } diff --git a/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/objectdetector/ObjectDetector.java b/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/objectdetector/ObjectDetector.java index 5287ba325..d9a36cce7 100644 --- a/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/objectdetector/ObjectDetector.java +++ b/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/objectdetector/ObjectDetector.java @@ -39,6 +39,7 @@ import com.google.mediapipe.formats.proto.DetectionProto.Detection; import java.io.File; import java.io.IOException; import java.nio.ByteBuffer; +import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; import java.util.List; @@ -170,6 +171,13 @@ public final class ObjectDetector extends BaseVisionTaskApi { new OutputHandler.OutputPacketConverter() { @Override public ObjectDetectionResult convertToTaskResult(List packets) { + // If there is no object detected in the image, just returns empty lists. + if (packets.get(DETECTIONS_OUT_STREAM_INDEX).isEmpty()) { + return ObjectDetectionResult.create( + new ArrayList<>(), + BaseVisionTaskApi.generateResultTimestampMs( + detectorOptions.runningMode(), packets.get(DETECTIONS_OUT_STREAM_INDEX))); + } return ObjectDetectionResult.create( PacketGetter.getProtoVector( packets.get(DETECTIONS_OUT_STREAM_INDEX), Detection.parser()), diff --git a/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarker.java b/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarker.java index 2ebdc0732..fa2d3da17 100644 --- a/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarker.java +++ b/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarker.java @@ -79,8 +79,7 @@ public final class PoseLandmarker extends BaseVisionTaskApi { private static final int LANDMARKS_OUT_STREAM_INDEX = 0; private static final int WORLD_LANDMARKS_OUT_STREAM_INDEX = 1; - private static final int AUXILIARY_LANDMARKS_OUT_STREAM_INDEX = 2; - private static final int IMAGE_OUT_STREAM_INDEX = 3; + private static final int IMAGE_OUT_STREAM_INDEX = 2; private static int segmentationMasksOutStreamIndex = -1; private static final String TASK_GRAPH_NAME = "mediapipe.tasks.vision.pose_landmarker.PoseLandmarkerGraph"; @@ -145,7 +144,6 @@ public final class PoseLandmarker extends BaseVisionTaskApi { List outputStreams = new ArrayList<>(); outputStreams.add("NORM_LANDMARKS:pose_landmarks"); outputStreams.add("WORLD_LANDMARKS:world_landmarks"); - outputStreams.add("AUXILIARY_LANDMARKS:auxiliary_landmarks"); outputStreams.add("IMAGE:image_out"); if (landmarkerOptions.outputSegmentationMasks()) { outputStreams.add("SEGMENTATION_MASK:segmentation_masks"); @@ -161,7 +159,6 @@ public final class PoseLandmarker extends BaseVisionTaskApi { // If there is no poses detected in the image, just returns empty lists. if (packets.get(LANDMARKS_OUT_STREAM_INDEX).isEmpty()) { return PoseLandmarkerResult.create( - new ArrayList<>(), new ArrayList<>(), new ArrayList<>(), Optional.empty(), @@ -179,9 +176,6 @@ public final class PoseLandmarker extends BaseVisionTaskApi { packets.get(LANDMARKS_OUT_STREAM_INDEX), NormalizedLandmarkList.parser()), PacketGetter.getProtoVector( packets.get(WORLD_LANDMARKS_OUT_STREAM_INDEX), LandmarkList.parser()), - PacketGetter.getProtoVector( - packets.get(AUXILIARY_LANDMARKS_OUT_STREAM_INDEX), - NormalizedLandmarkList.parser()), segmentedMasks, BaseVisionTaskApi.generateResultTimestampMs( landmarkerOptions.runningMode(), packets.get(LANDMARKS_OUT_STREAM_INDEX))); diff --git a/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarkerResult.java b/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarkerResult.java index 488f2a556..389e78266 100644 --- a/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarkerResult.java +++ b/mediapipe/tasks/java/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarkerResult.java @@ -40,7 +40,6 @@ public abstract class PoseLandmarkerResult implements TaskResult { static PoseLandmarkerResult create( List landmarksProto, List worldLandmarksProto, - List auxiliaryLandmarksProto, Optional> segmentationMasksData, long timestampMs) { @@ -52,7 +51,6 @@ public abstract class PoseLandmarkerResult implements TaskResult { List> multiPoseLandmarks = new ArrayList<>(); List> multiPoseWorldLandmarks = new ArrayList<>(); - List> multiPoseAuxiliaryLandmarks = new ArrayList<>(); for (LandmarkProto.NormalizedLandmarkList poseLandmarksProto : landmarksProto) { List poseLandmarks = new ArrayList<>(); multiPoseLandmarks.add(poseLandmarks); @@ -75,24 +73,10 @@ public abstract class PoseLandmarkerResult implements TaskResult { poseWorldLandmarkProto.getZ())); } } - for (LandmarkProto.NormalizedLandmarkList poseAuxiliaryLandmarksProto : - auxiliaryLandmarksProto) { - List poseAuxiliaryLandmarks = new ArrayList<>(); - multiPoseAuxiliaryLandmarks.add(poseAuxiliaryLandmarks); - for (LandmarkProto.NormalizedLandmark poseAuxiliaryLandmarkProto : - poseAuxiliaryLandmarksProto.getLandmarkList()) { - poseAuxiliaryLandmarks.add( - NormalizedLandmark.create( - poseAuxiliaryLandmarkProto.getX(), - poseAuxiliaryLandmarkProto.getY(), - poseAuxiliaryLandmarkProto.getZ())); - } - } return new AutoValue_PoseLandmarkerResult( timestampMs, Collections.unmodifiableList(multiPoseLandmarks), Collections.unmodifiableList(multiPoseWorldLandmarks), - Collections.unmodifiableList(multiPoseAuxiliaryLandmarks), multiPoseSegmentationMasks); } @@ -105,9 +89,6 @@ public abstract class PoseLandmarkerResult implements TaskResult { /** Pose landmarks in world coordniates of detected poses. */ public abstract List> worldLandmarks(); - /** Pose auxiliary landmarks. */ - public abstract List> auxiliaryLandmarks(); - /** Pose segmentation masks. */ public abstract Optional> segmentationMasks(); } diff --git a/mediapipe/tasks/javatests/com/google/mediapipe/tasks/vision/objectdetector/ObjectDetectorTest.java b/mediapipe/tasks/javatests/com/google/mediapipe/tasks/vision/objectdetector/ObjectDetectorTest.java index 33aa025d2..20ddfcef6 100644 --- a/mediapipe/tasks/javatests/com/google/mediapipe/tasks/vision/objectdetector/ObjectDetectorTest.java +++ b/mediapipe/tasks/javatests/com/google/mediapipe/tasks/vision/objectdetector/ObjectDetectorTest.java @@ -45,6 +45,7 @@ import org.junit.runners.Suite.SuiteClasses; @SuiteClasses({ObjectDetectorTest.General.class, ObjectDetectorTest.RunningModeTest.class}) public class ObjectDetectorTest { private static final String MODEL_FILE = "coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.tflite"; + private static final String NO_NMS_MODEL_FILE = "efficientdet_lite0_fp16_no_nms.tflite"; private static final String CAT_AND_DOG_IMAGE = "cats_and_dogs.jpg"; private static final String CAT_AND_DOG_ROTATED_IMAGE = "cats_and_dogs_rotated.jpg"; private static final int IMAGE_WIDTH = 1200; @@ -109,6 +110,20 @@ public class ObjectDetectorTest { assertContainsOnlyCat(results, CAT_BOUNDING_BOX, CAT_SCORE); } + @Test + public void detect_succeedsWithNoObjectDetected() throws Exception { + ObjectDetectorOptions options = + ObjectDetectorOptions.builder() + .setBaseOptions(BaseOptions.builder().setModelAssetPath(NO_NMS_MODEL_FILE).build()) + .setScoreThreshold(1.0f) + .build(); + ObjectDetector objectDetector = + ObjectDetector.createFromOptions(ApplicationProvider.getApplicationContext(), options); + ObjectDetectionResult results = objectDetector.detect(getImageFromAsset(CAT_AND_DOG_IMAGE)); + // The score threshold should block objects. + assertThat(results.detections()).isEmpty(); + } + @Test public void detect_succeedsWithAllowListOption() throws Exception { ObjectDetectorOptions options = diff --git a/mediapipe/tasks/javatests/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarkerTest.java b/mediapipe/tasks/javatests/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarkerTest.java index 1d0b1decd..7adef9e27 100644 --- a/mediapipe/tasks/javatests/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarkerTest.java +++ b/mediapipe/tasks/javatests/com/google/mediapipe/tasks/vision/poselandmarker/PoseLandmarkerTest.java @@ -330,7 +330,6 @@ public class PoseLandmarkerTest { return PoseLandmarkerResult.create( Arrays.asList(landmarksDetectionResultProto.getLandmarks()), Arrays.asList(landmarksDetectionResultProto.getWorldLandmarks()), - Arrays.asList(), Optional.empty(), /* timestampMs= */ 0); } diff --git a/mediapipe/tasks/python/test/vision/object_detector_test.py b/mediapipe/tasks/python/test/vision/object_detector_test.py index 7878e7f52..adeddafd7 100644 --- a/mediapipe/tasks/python/test/vision/object_detector_test.py +++ b/mediapipe/tasks/python/test/vision/object_detector_test.py @@ -44,6 +44,7 @@ _ObjectDetectorOptions = object_detector.ObjectDetectorOptions _RUNNING_MODE = running_mode_module.VisionTaskRunningMode _MODEL_FILE = 'coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.tflite' +_NO_NMS_MODEL_FILE = 'efficientdet_lite0_fp16_no_nms.tflite' _IMAGE_FILE = 'cats_and_dogs.jpg' _EXPECTED_DETECTION_RESULT = _DetectionResult( detections=[ @@ -304,7 +305,7 @@ class ObjectDetectorTest(parameterized.TestCase): with _ObjectDetector.create_from_options(options) as unused_detector: pass - def test_empty_detection_outputs(self): + def test_empty_detection_outputs_with_in_model_nms(self): options = _ObjectDetectorOptions( base_options=_BaseOptions(model_asset_path=self.model_path), score_threshold=1, @@ -314,6 +315,18 @@ class ObjectDetectorTest(parameterized.TestCase): detection_result = detector.detect(self.test_image) self.assertEmpty(detection_result.detections) + def test_empty_detection_outputs_without_in_model_nms(self): + options = _ObjectDetectorOptions( + base_options=_BaseOptions( + model_asset_path=test_utils.get_test_data_path( + os.path.join(_TEST_DATA_DIR, _NO_NMS_MODEL_FILE))), + score_threshold=1, + ) + with _ObjectDetector.create_from_options(options) as detector: + # Performs object detection on the input. + detection_result = detector.detect(self.test_image) + self.assertEmpty(detection_result.detections) + def test_missing_result_callback(self): options = _ObjectDetectorOptions( base_options=_BaseOptions(model_asset_path=self.model_path), diff --git a/mediapipe/tasks/python/test/vision/pose_landmarker_test.py b/mediapipe/tasks/python/test/vision/pose_landmarker_test.py index 1b73ecdfb..fff6879cc 100644 --- a/mediapipe/tasks/python/test/vision/pose_landmarker_test.py +++ b/mediapipe/tasks/python/test/vision/pose_landmarker_test.py @@ -74,7 +74,6 @@ def _get_expected_pose_landmarker_result( return PoseLandmarkerResult( pose_landmarks=[landmarks_detection_result.landmarks], pose_world_landmarks=[], - pose_auxiliary_landmarks=[], ) @@ -296,7 +295,6 @@ class PoseLandmarkerTest(parameterized.TestCase): # Comparing results. self.assertEmpty(detection_result.pose_landmarks) self.assertEmpty(detection_result.pose_world_landmarks) - self.assertEmpty(detection_result.pose_auxiliary_landmarks) def test_missing_result_callback(self): options = _PoseLandmarkerOptions( @@ -391,7 +389,7 @@ class PoseLandmarkerTest(parameterized.TestCase): True, _get_expected_pose_landmarker_result(_POSE_LANDMARKS), ), - (_BURGER_IMAGE, 0, False, PoseLandmarkerResult([], [], [])), + (_BURGER_IMAGE, 0, False, PoseLandmarkerResult([], [])), ) def test_detect_for_video( self, image_path, rotation, output_segmentation_masks, expected_result @@ -473,7 +471,7 @@ class PoseLandmarkerTest(parameterized.TestCase): True, _get_expected_pose_landmarker_result(_POSE_LANDMARKS), ), - (_BURGER_IMAGE, 0, False, PoseLandmarkerResult([], [], [])), + (_BURGER_IMAGE, 0, False, PoseLandmarkerResult([], [])), ) def test_detect_async_calls( self, image_path, rotation, output_segmentation_masks, expected_result diff --git a/mediapipe/tasks/python/vision/object_detector.py b/mediapipe/tasks/python/vision/object_detector.py index 3bdd1b5de..380d57c22 100644 --- a/mediapipe/tasks/python/vision/object_detector.py +++ b/mediapipe/tasks/python/vision/object_detector.py @@ -198,6 +198,15 @@ class ObjectDetector(base_vision_task_api.BaseVisionTaskApi): def packets_callback(output_packets: Mapping[str, packet_module.Packet]): if output_packets[_IMAGE_OUT_STREAM_NAME].is_empty(): return + image = packet_getter.get_image(output_packets[_IMAGE_OUT_STREAM_NAME]) + if output_packets[_DETECTIONS_OUT_STREAM_NAME].is_empty(): + empty_packet = output_packets[_DETECTIONS_OUT_STREAM_NAME] + options.result_callback( + ObjectDetectorResult([]), + image, + empty_packet.timestamp.value // _MICRO_SECONDS_PER_MILLISECOND, + ) + return detection_proto_list = packet_getter.get_proto_list( output_packets[_DETECTIONS_OUT_STREAM_NAME] ) @@ -207,7 +216,6 @@ class ObjectDetector(base_vision_task_api.BaseVisionTaskApi): for result in detection_proto_list ] ) - image = packet_getter.get_image(output_packets[_IMAGE_OUT_STREAM_NAME]) timestamp = output_packets[_IMAGE_OUT_STREAM_NAME].timestamp options.result_callback(detection_result, image, timestamp) @@ -266,6 +274,8 @@ class ObjectDetector(base_vision_task_api.BaseVisionTaskApi): normalized_rect.to_pb2() ), }) + if output_packets[_DETECTIONS_OUT_STREAM_NAME].is_empty(): + return ObjectDetectorResult([]) detection_proto_list = packet_getter.get_proto_list( output_packets[_DETECTIONS_OUT_STREAM_NAME] ) @@ -315,6 +325,8 @@ class ObjectDetector(base_vision_task_api.BaseVisionTaskApi): normalized_rect.to_pb2() ).at(timestamp_ms * _MICRO_SECONDS_PER_MILLISECOND), }) + if output_packets[_DETECTIONS_OUT_STREAM_NAME].is_empty(): + return ObjectDetectorResult([]) detection_proto_list = packet_getter.get_proto_list( output_packets[_DETECTIONS_OUT_STREAM_NAME] ) diff --git a/mediapipe/tasks/python/vision/pose_landmarker.py b/mediapipe/tasks/python/vision/pose_landmarker.py index 3ff7edb0a..8f67e6739 100644 --- a/mediapipe/tasks/python/vision/pose_landmarker.py +++ b/mediapipe/tasks/python/vision/pose_landmarker.py @@ -49,8 +49,6 @@ _NORM_LANDMARKS_STREAM_NAME = 'norm_landmarks' _NORM_LANDMARKS_TAG = 'NORM_LANDMARKS' _POSE_WORLD_LANDMARKS_STREAM_NAME = 'world_landmarks' _POSE_WORLD_LANDMARKS_TAG = 'WORLD_LANDMARKS' -_POSE_AUXILIARY_LANDMARKS_STREAM_NAME = 'auxiliary_landmarks' -_POSE_AUXILIARY_LANDMARKS_TAG = 'AUXILIARY_LANDMARKS' _TASK_GRAPH_NAME = 'mediapipe.tasks.vision.pose_landmarker.PoseLandmarkerGraph' _MICRO_SECONDS_PER_MILLISECOND = 1000 @@ -62,14 +60,11 @@ class PoseLandmarkerResult: Attributes: pose_landmarks: Detected pose landmarks in normalized image coordinates. pose_world_landmarks: Detected pose landmarks in world coordinates. - pose_auxiliary_landmarks: Detected auxiliary landmarks, used for deriving - ROI for next frame. segmentation_masks: Optional segmentation masks for pose. """ pose_landmarks: List[List[landmark_module.NormalizedLandmark]] pose_world_landmarks: List[List[landmark_module.Landmark]] - pose_auxiliary_landmarks: List[List[landmark_module.NormalizedLandmark]] segmentation_masks: Optional[List[image_module.Image]] = None @@ -77,7 +72,7 @@ def _build_landmarker_result( output_packets: Mapping[str, packet_module.Packet] ) -> PoseLandmarkerResult: """Constructs a `PoseLandmarkerResult` from output packets.""" - pose_landmarker_result = PoseLandmarkerResult([], [], []) + pose_landmarker_result = PoseLandmarkerResult([], []) if _SEGMENTATION_MASK_STREAM_NAME in output_packets: pose_landmarker_result.segmentation_masks = packet_getter.get_image_list( @@ -90,9 +85,6 @@ def _build_landmarker_result( pose_world_landmarks_proto_list = packet_getter.get_proto_list( output_packets[_POSE_WORLD_LANDMARKS_STREAM_NAME] ) - pose_auxiliary_landmarks_proto_list = packet_getter.get_proto_list( - output_packets[_POSE_AUXILIARY_LANDMARKS_STREAM_NAME] - ) for proto in pose_landmarks_proto_list: pose_landmarks = landmark_pb2.NormalizedLandmarkList() @@ -116,19 +108,6 @@ def _build_landmarker_result( pose_world_landmarks_list ) - for proto in pose_auxiliary_landmarks_proto_list: - pose_auxiliary_landmarks = landmark_pb2.NormalizedLandmarkList() - pose_auxiliary_landmarks.MergeFrom(proto) - pose_auxiliary_landmarks_list = [] - for pose_auxiliary_landmark in pose_auxiliary_landmarks.landmark: - pose_auxiliary_landmarks_list.append( - landmark_module.NormalizedLandmark.create_from_pb2( - pose_auxiliary_landmark - ) - ) - pose_landmarker_result.pose_auxiliary_landmarks.append( - pose_auxiliary_landmarks_list - ) return pose_landmarker_result @@ -301,7 +280,7 @@ class PoseLandmarker(base_vision_task_api.BaseVisionTaskApi): if output_packets[_NORM_LANDMARKS_STREAM_NAME].is_empty(): empty_packet = output_packets[_NORM_LANDMARKS_STREAM_NAME] options.result_callback( - PoseLandmarkerResult([], [], []), + PoseLandmarkerResult([], []), image, empty_packet.timestamp.value // _MICRO_SECONDS_PER_MILLISECOND, ) @@ -320,10 +299,6 @@ class PoseLandmarker(base_vision_task_api.BaseVisionTaskApi): ':'.join( [_POSE_WORLD_LANDMARKS_TAG, _POSE_WORLD_LANDMARKS_STREAM_NAME] ), - ':'.join([ - _POSE_AUXILIARY_LANDMARKS_TAG, - _POSE_AUXILIARY_LANDMARKS_STREAM_NAME, - ]), ':'.join([_IMAGE_TAG, _IMAGE_OUT_STREAM_NAME]), ] @@ -382,7 +357,7 @@ class PoseLandmarker(base_vision_task_api.BaseVisionTaskApi): }) if output_packets[_NORM_LANDMARKS_STREAM_NAME].is_empty(): - return PoseLandmarkerResult([], [], []) + return PoseLandmarkerResult([], []) return _build_landmarker_result(output_packets) @@ -427,7 +402,7 @@ class PoseLandmarker(base_vision_task_api.BaseVisionTaskApi): }) if output_packets[_NORM_LANDMARKS_STREAM_NAME].is_empty(): - return PoseLandmarkerResult([], [], []) + return PoseLandmarkerResult([], []) return _build_landmarker_result(output_packets) diff --git a/mediapipe/tasks/web/vision/BUILD b/mediapipe/tasks/web/vision/BUILD index 503db3252..10e98de8b 100644 --- a/mediapipe/tasks/web/vision/BUILD +++ b/mediapipe/tasks/web/vision/BUILD @@ -21,6 +21,7 @@ VISION_LIBS = [ "//mediapipe/tasks/web/core:fileset_resolver", "//mediapipe/tasks/web/vision/core:drawing_utils", "//mediapipe/tasks/web/vision/core:image", + "//mediapipe/tasks/web/vision/core:mask", "//mediapipe/tasks/web/vision/face_detector", "//mediapipe/tasks/web/vision/face_landmarker", "//mediapipe/tasks/web/vision/face_stylizer", diff --git a/mediapipe/tasks/web/vision/core/BUILD b/mediapipe/tasks/web/vision/core/BUILD index c53247ba7..325603353 100644 --- a/mediapipe/tasks/web/vision/core/BUILD +++ b/mediapipe/tasks/web/vision/core/BUILD @@ -41,7 +41,10 @@ mediapipe_ts_library( mediapipe_ts_library( name = "image", - srcs = ["image.ts"], + srcs = [ + "image.ts", + "image_shader_context.ts", + ], ) mediapipe_ts_library( @@ -56,12 +59,34 @@ jasmine_node_test( deps = [":image_test_lib"], ) +mediapipe_ts_library( + name = "mask", + srcs = ["mask.ts"], + deps = [":image"], +) + +mediapipe_ts_library( + name = "mask_test_lib", + testonly = True, + srcs = ["mask.test.ts"], + deps = [ + ":image", + ":mask", + ], +) + +jasmine_node_test( + name = "mask_test", + deps = [":mask_test_lib"], +) + mediapipe_ts_library( name = "vision_task_runner", srcs = ["vision_task_runner.ts"], deps = [ ":image", ":image_processing_options", + ":mask", ":vision_task_options", "//mediapipe/framework/formats:rect_jspb_proto", "//mediapipe/tasks/web/core", @@ -91,7 +116,6 @@ mediapipe_ts_library( mediapipe_ts_library( name = "render_utils", srcs = ["render_utils.ts"], - deps = [":image"], ) jasmine_node_test( diff --git a/mediapipe/tasks/web/vision/core/image.test.ts b/mediapipe/tasks/web/vision/core/image.test.ts index 73eb44240..e92debc2e 100644 --- a/mediapipe/tasks/web/vision/core/image.test.ts +++ b/mediapipe/tasks/web/vision/core/image.test.ts @@ -16,7 +16,8 @@ import 'jasmine'; -import {MPImage, MPImageShaderContext, MPImageType} from './image'; +import {MPImage} from './image'; +import {MPImageShaderContext} from './image_shader_context'; const WIDTH = 2; const HEIGHT = 2; @@ -40,8 +41,6 @@ const IMAGE_2_3 = [ class MPImageTestContext { canvas!: OffscreenCanvas; gl!: WebGL2RenderingContext; - uint8ClampedArray!: Uint8ClampedArray; - float32Array!: Float32Array; imageData!: ImageData; imageBitmap!: ImageBitmap; webGLTexture!: WebGLTexture; @@ -55,17 +54,11 @@ class MPImageTestContext { const gl = this.gl; - this.uint8ClampedArray = new Uint8ClampedArray(pixels.length / 4); - this.float32Array = new Float32Array(pixels.length / 4); - for (let i = 0; i < this.uint8ClampedArray.length; ++i) { - this.uint8ClampedArray[i] = pixels[i * 4]; - this.float32Array[i] = pixels[i * 4] / 255; - } this.imageData = new ImageData(new Uint8ClampedArray(pixels), width, height); this.imageBitmap = await createImageBitmap(this.imageData); - this.webGLTexture = gl.createTexture()!; + this.webGLTexture = gl.createTexture()!; gl.bindTexture(gl.TEXTURE_2D, this.webGLTexture); gl.texImage2D( gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, this.imageBitmap); @@ -74,10 +67,6 @@ class MPImageTestContext { get(type: unknown) { switch (type) { - case Uint8ClampedArray: - return this.uint8ClampedArray; - case Float32Array: - return this.float32Array; case ImageData: return this.imageData; case ImageBitmap: @@ -125,25 +114,22 @@ class MPImageTestContext { gl.bindTexture(gl.TEXTURE_2D, null); + // Sanity check + expect(pixels.find(v => !!v)).toBeDefined(); + return pixels; } function assertEquality(image: MPImage, expected: ImageType): void { - if (expected instanceof Uint8ClampedArray) { - const result = image.get(MPImageType.UINT8_CLAMPED_ARRAY); - expect(result).toEqual(expected); - } else if (expected instanceof Float32Array) { - const result = image.get(MPImageType.FLOAT32_ARRAY); - expect(result).toEqual(expected); - } else if (expected instanceof ImageData) { - const result = image.get(MPImageType.IMAGE_DATA); + if (expected instanceof ImageData) { + const result = image.getAsImageData(); expect(result).toEqual(expected); } else if (expected instanceof ImageBitmap) { - const result = image.get(MPImageType.IMAGE_BITMAP); + const result = image.getAsImageBitmap(); expect(readPixelsFromImageBitmap(result)) .toEqual(readPixelsFromImageBitmap(expected)); } else { // WebGLTexture - const result = image.get(MPImageType.WEBGL_TEXTURE); + const result = image.getAsWebGLTexture(); expect(readPixelsFromWebGLTexture(result)) .toEqual(readPixelsFromWebGLTexture(expected)); } @@ -177,9 +163,7 @@ class MPImageTestContext { shaderContext.close(); } - const sources = skip ? - [] : - [Uint8ClampedArray, Float32Array, ImageData, ImageBitmap, WebGLTexture]; + const sources = skip ? [] : [ImageData, ImageBitmap, WebGLTexture]; for (let i = 0; i < sources.length; i++) { for (let j = 0; j < sources.length; j++) { @@ -202,11 +186,11 @@ class MPImageTestContext { const shaderContext = new MPImageShaderContext(); const image = new MPImage( - [context.webGLTexture], - /* ownsImageBitmap= */ false, /* ownsWebGLTexture= */ false, - context.canvas, shaderContext, WIDTH, HEIGHT); + [context.webGLTexture], /* ownsImageBitmap= */ false, + /* ownsWebGLTexture= */ false, context.canvas, shaderContext, WIDTH, + HEIGHT); - const result = image.clone().get(MPImageType.IMAGE_DATA); + const result = image.clone().getAsImageData(); expect(result).toEqual(context.imageData); shaderContext.close(); @@ -217,19 +201,19 @@ class MPImageTestContext { const shaderContext = new MPImageShaderContext(); const image = new MPImage( - [context.webGLTexture], - /* ownsImageBitmap= */ false, /* ownsWebGLTexture= */ false, - context.canvas, shaderContext, WIDTH, HEIGHT); + [context.webGLTexture], /* ownsImageBitmap= */ false, + /* ownsWebGLTexture= */ false, context.canvas, shaderContext, WIDTH, + HEIGHT); // Verify that we can mix the different shader modes by running them out of // order. - let result = image.get(MPImageType.IMAGE_DATA); + let result = image.getAsImageData(); expect(result).toEqual(context.imageData); - result = image.clone().get(MPImageType.IMAGE_DATA); + result = image.clone().getAsImageData(); expect(result).toEqual(context.imageData); - result = image.get(MPImageType.IMAGE_DATA); + result = image.getAsImageData(); expect(result).toEqual(context.imageData); shaderContext.close(); @@ -241,43 +225,21 @@ class MPImageTestContext { const shaderContext = new MPImageShaderContext(); const image = createImage(shaderContext, context.imageData, WIDTH, HEIGHT); - expect(image.has(MPImageType.IMAGE_DATA)).toBe(true); - expect(image.has(MPImageType.UINT8_CLAMPED_ARRAY)).toBe(false); - expect(image.has(MPImageType.FLOAT32_ARRAY)).toBe(false); - expect(image.has(MPImageType.WEBGL_TEXTURE)).toBe(false); - expect(image.has(MPImageType.IMAGE_BITMAP)).toBe(false); + expect(image.hasImageData()).toBe(true); + expect(image.hasWebGLTexture()).toBe(false); + expect(image.hasImageBitmap()).toBe(false); - image.get(MPImageType.UINT8_CLAMPED_ARRAY); + image.getAsWebGLTexture(); - expect(image.has(MPImageType.IMAGE_DATA)).toBe(true); - expect(image.has(MPImageType.UINT8_CLAMPED_ARRAY)).toBe(true); - expect(image.has(MPImageType.FLOAT32_ARRAY)).toBe(false); - expect(image.has(MPImageType.WEBGL_TEXTURE)).toBe(false); - expect(image.has(MPImageType.IMAGE_BITMAP)).toBe(false); + expect(image.hasImageData()).toBe(true); + expect(image.hasWebGLTexture()).toBe(true); + expect(image.hasImageBitmap()).toBe(false); - image.get(MPImageType.FLOAT32_ARRAY); + image.getAsImageBitmap(); - expect(image.has(MPImageType.IMAGE_DATA)).toBe(true); - expect(image.has(MPImageType.UINT8_CLAMPED_ARRAY)).toBe(true); - expect(image.has(MPImageType.FLOAT32_ARRAY)).toBe(true); - expect(image.has(MPImageType.WEBGL_TEXTURE)).toBe(false); - expect(image.has(MPImageType.IMAGE_BITMAP)).toBe(false); - - image.get(MPImageType.WEBGL_TEXTURE); - - expect(image.has(MPImageType.IMAGE_DATA)).toBe(true); - expect(image.has(MPImageType.UINT8_CLAMPED_ARRAY)).toBe(true); - expect(image.has(MPImageType.FLOAT32_ARRAY)).toBe(true); - expect(image.has(MPImageType.WEBGL_TEXTURE)).toBe(true); - expect(image.has(MPImageType.IMAGE_BITMAP)).toBe(false); - - image.get(MPImageType.IMAGE_BITMAP); - - expect(image.has(MPImageType.IMAGE_DATA)).toBe(true); - expect(image.has(MPImageType.UINT8_CLAMPED_ARRAY)).toBe(true); - expect(image.has(MPImageType.FLOAT32_ARRAY)).toBe(true); - expect(image.has(MPImageType.WEBGL_TEXTURE)).toBe(true); - expect(image.has(MPImageType.IMAGE_BITMAP)).toBe(true); + expect(image.hasImageData()).toBe(true); + expect(image.hasWebGLTexture()).toBe(true); + expect(image.hasImageBitmap()).toBe(true); image.close(); shaderContext.close(); diff --git a/mediapipe/tasks/web/vision/core/image.ts b/mediapipe/tasks/web/vision/core/image.ts index 7d6997d37..bcc6b7ca1 100644 --- a/mediapipe/tasks/web/vision/core/image.ts +++ b/mediapipe/tasks/web/vision/core/image.ts @@ -14,14 +14,10 @@ * limitations under the License. */ +import {assertNotNull, MPImageShaderContext} from '../../../../tasks/web/vision/core/image_shader_context'; + /** The underlying type of the image. */ -export enum MPImageType { - /** Represents the native `UInt8ClampedArray` type. */ - UINT8_CLAMPED_ARRAY, - /** - * Represents the native `Float32Array` type. Values range from [0.0, 1.0]. - */ - FLOAT32_ARRAY, +enum MPImageType { /** Represents the native `ImageData` type. */ IMAGE_DATA, /** Represents the native `ImageBitmap` type. */ @@ -31,377 +27,16 @@ export enum MPImageType { } /** The supported image formats. For internal usage. */ -export type MPImageContainer = - Uint8ClampedArray|Float32Array|ImageData|ImageBitmap|WebGLTexture; - -const VERTEX_SHADER = ` - attribute vec2 aVertex; - attribute vec2 aTex; - varying vec2 vTex; - void main(void) { - gl_Position = vec4(aVertex, 0.0, 1.0); - vTex = aTex; - }`; - -const FRAGMENT_SHADER = ` - precision mediump float; - varying vec2 vTex; - uniform sampler2D inputTexture; - void main() { - gl_FragColor = texture2D(inputTexture, vTex); - } - `; - -function assertNotNull(value: T|null, msg: string): T { - if (value === null) { - throw new Error(`Unable to obtain required WebGL resource: ${msg}`); - } - return value; -} - -// TODO: Move internal-only types to different module. - -/** - * Utility class that encapsulates the buffers used by `MPImageShaderContext`. - * For internal use only. - */ -class MPImageShaderBuffers { - constructor( - private readonly gl: WebGL2RenderingContext, - private readonly vertexArrayObject: WebGLVertexArrayObject, - private readonly vertexBuffer: WebGLBuffer, - private readonly textureBuffer: WebGLBuffer) {} - - bind() { - this.gl.bindVertexArray(this.vertexArrayObject); - } - - unbind() { - this.gl.bindVertexArray(null); - } - - close() { - this.gl.deleteVertexArray(this.vertexArrayObject); - this.gl.deleteBuffer(this.vertexBuffer); - this.gl.deleteBuffer(this.textureBuffer); - } -} - -/** - * A class that encapsulates the shaders used by an MPImage. Can be re-used - * across MPImages that use the same WebGL2Rendering context. - * - * For internal use only. - */ -export class MPImageShaderContext { - private gl?: WebGL2RenderingContext; - private framebuffer?: WebGLFramebuffer; - private program?: WebGLProgram; - private vertexShader?: WebGLShader; - private fragmentShader?: WebGLShader; - private aVertex?: GLint; - private aTex?: GLint; - - /** - * The shader buffers used for passthrough renders that don't modify the - * input texture. - */ - private shaderBuffersPassthrough?: MPImageShaderBuffers; - - /** - * The shader buffers used for passthrough renders that flip the input texture - * vertically before conversion to a different type. This is used to flip the - * texture to the expected orientation for drawing in the browser. - */ - private shaderBuffersFlipVertically?: MPImageShaderBuffers; - - private compileShader(source: string, type: number): WebGLShader { - const gl = this.gl!; - const shader = - assertNotNull(gl.createShader(type), 'Failed to create WebGL shader'); - gl.shaderSource(shader, source); - gl.compileShader(shader); - if (!gl.getShaderParameter(shader, gl.COMPILE_STATUS)) { - const info = gl.getShaderInfoLog(shader); - throw new Error(`Could not compile WebGL shader: ${info}`); - } - gl.attachShader(this.program!, shader); - return shader; - } - - private setupShaders(): void { - const gl = this.gl!; - this.program = - assertNotNull(gl.createProgram()!, 'Failed to create WebGL program'); - - this.vertexShader = this.compileShader(VERTEX_SHADER, gl.VERTEX_SHADER); - this.fragmentShader = - this.compileShader(FRAGMENT_SHADER, gl.FRAGMENT_SHADER); - - gl.linkProgram(this.program); - const linked = gl.getProgramParameter(this.program, gl.LINK_STATUS); - if (!linked) { - const info = gl.getProgramInfoLog(this.program); - throw new Error(`Error during program linking: ${info}`); - } - - this.aVertex = gl.getAttribLocation(this.program, 'aVertex'); - this.aTex = gl.getAttribLocation(this.program, 'aTex'); - } - - private createBuffers(flipVertically: boolean): MPImageShaderBuffers { - const gl = this.gl!; - const vertexArrayObject = - assertNotNull(gl.createVertexArray(), 'Failed to create vertex array'); - gl.bindVertexArray(vertexArrayObject); - - const vertexBuffer = - assertNotNull(gl.createBuffer(), 'Failed to create buffer'); - gl.bindBuffer(gl.ARRAY_BUFFER, vertexBuffer); - gl.enableVertexAttribArray(this.aVertex!); - gl.vertexAttribPointer(this.aVertex!, 2, gl.FLOAT, false, 0, 0); - gl.bufferData( - gl.ARRAY_BUFFER, new Float32Array([-1, -1, -1, 1, 1, 1, 1, -1]), - gl.STATIC_DRAW); - - const textureBuffer = - assertNotNull(gl.createBuffer(), 'Failed to create buffer'); - gl.bindBuffer(gl.ARRAY_BUFFER, textureBuffer); - gl.enableVertexAttribArray(this.aTex!); - gl.vertexAttribPointer(this.aTex!, 2, gl.FLOAT, false, 0, 0); - - const bufferData = - flipVertically ? [0, 1, 0, 0, 1, 0, 1, 1] : [0, 0, 0, 1, 1, 1, 1, 0]; - gl.bufferData( - gl.ARRAY_BUFFER, new Float32Array(bufferData), gl.STATIC_DRAW); - - gl.bindBuffer(gl.ARRAY_BUFFER, null); - gl.bindVertexArray(null); - - return new MPImageShaderBuffers( - gl, vertexArrayObject, vertexBuffer, textureBuffer); - } - - private getShaderBuffers(flipVertically: boolean): MPImageShaderBuffers { - if (flipVertically) { - if (!this.shaderBuffersFlipVertically) { - this.shaderBuffersFlipVertically = - this.createBuffers(/* flipVertically= */ true); - } - return this.shaderBuffersFlipVertically; - } else { - if (!this.shaderBuffersPassthrough) { - this.shaderBuffersPassthrough = - this.createBuffers(/* flipVertically= */ false); - } - return this.shaderBuffersPassthrough; - } - } - - private maybeInitGL(gl: WebGL2RenderingContext): void { - if (!this.gl) { - this.gl = gl; - } else if (gl !== this.gl) { - throw new Error('Cannot change GL context once initialized'); - } - } - - /** Runs the callback using the shader. */ - run( - gl: WebGL2RenderingContext, flipVertically: boolean, - callback: () => T): T { - this.maybeInitGL(gl); - - if (!this.program) { - this.setupShaders(); - } - - const shaderBuffers = this.getShaderBuffers(flipVertically); - gl.useProgram(this.program!); - shaderBuffers.bind(); - const result = callback(); - shaderBuffers.unbind(); - - return result; - } - - /** - * Binds a framebuffer to the canvas. If the framebuffer does not yet exist, - * creates it first. Binds the provided texture to the framebuffer. - */ - bindFramebuffer(gl: WebGL2RenderingContext, texture: WebGLTexture): void { - this.maybeInitGL(gl); - if (!this.framebuffer) { - this.framebuffer = - assertNotNull(gl.createFramebuffer(), 'Failed to create framebuffe.'); - } - gl.bindFramebuffer(gl.FRAMEBUFFER, this.framebuffer); - gl.framebufferTexture2D( - gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, texture, 0); - } - - unbindFramebuffer(): void { - this.gl?.bindFramebuffer(this.gl.FRAMEBUFFER, null); - } - - close() { - if (this.program) { - const gl = this.gl!; - gl.deleteProgram(this.program); - gl.deleteShader(this.vertexShader!); - gl.deleteShader(this.fragmentShader!); - } - if (this.framebuffer) { - this.gl!.deleteFramebuffer(this.framebuffer); - } - if (this.shaderBuffersPassthrough) { - this.shaderBuffersPassthrough.close(); - } - if (this.shaderBuffersFlipVertically) { - this.shaderBuffersFlipVertically.close(); - } - } -} - -/** A four channel color with a red, green, blue and alpha values. */ -export type RGBAColor = [number, number, number, number]; - -/** - * An interface that can be used to provide custom conversion functions. These - * functions are invoked to convert pixel values between different channel - * counts and value ranges. Any conversion function that is not specified will - * result in a default conversion. - */ -export interface MPImageChannelConverter { - /** - * A conversion function to convert a number in the [0.0, 1.0] range to RGBA. - * The output is an array with four elemeents whose values range from 0 to 255 - * inclusive. - * - * The default conversion function is `[v * 255, v * 255, v * 255, 255]` - * and will log a warning if invoked. - */ - floatToRGBAConverter?: (value: number) => RGBAColor; - - /* - * A conversion function to convert a number in the [0, 255] range to RGBA. - * The output is an array with four elemeents whose values range from 0 to 255 - * inclusive. - * - * The default conversion function is `[v, v , v , 255]` and will log a - * warning if invoked. - */ - uint8ToRGBAConverter?: (value: number) => RGBAColor; - - /** - * A conversion function to convert an RGBA value in the range of 0 to 255 to - * a single value in the [0.0, 1.0] range. - * - * The default conversion function is `(r / 3 + g / 3 + b / 3) / 255` and will - * log a warning if invoked. - */ - rgbaToFloatConverter?: (r: number, g: number, b: number, a: number) => number; - - /** - * A conversion function to convert an RGBA value in the range of 0 to 255 to - * a single value in the [0, 255] range. - * - * The default conversion function is `r / 3 + g / 3 + b / 3` and will log a - * warning if invoked. - */ - rgbaToUint8Converter?: (r: number, g: number, b: number, a: number) => number; - - /** - * A conversion function to convert a single value in the 0.0 to 1.0 range to - * [0, 255]. - * - * The default conversion function is `r * 255` and will log a warning if - * invoked. - */ - floatToUint8Converter?: (value: number) => number; - - /** - * A conversion function to convert a single value in the 0 to 255 range to - * [0.0, 1.0] . - * - * The default conversion function is `r / 255` and will log a warning if - * invoked. - */ - uint8ToFloatConverter?: (value: number) => number; -} -/** - * Color converter that falls back to a default implementation if the - * user-provided converter does not specify a conversion. - */ -class DefaultColorConverter implements Required { - private static readonly WARNINGS_LOGGED = new Set(); - - constructor(private readonly customConverter: MPImageChannelConverter) {} - - floatToRGBAConverter(v: number): RGBAColor { - if (this.customConverter.floatToRGBAConverter) { - return this.customConverter.floatToRGBAConverter(v); - } - this.logWarningOnce('floatToRGBAConverter'); - return [v * 255, v * 255, v * 255, 255]; - } - - uint8ToRGBAConverter(v: number): RGBAColor { - if (this.customConverter.uint8ToRGBAConverter) { - return this.customConverter.uint8ToRGBAConverter(v); - } - this.logWarningOnce('uint8ToRGBAConverter'); - return [v, v, v, 255]; - } - - rgbaToFloatConverter(r: number, g: number, b: number, a: number): number { - if (this.customConverter.rgbaToFloatConverter) { - return this.customConverter.rgbaToFloatConverter(r, g, b, a); - } - this.logWarningOnce('rgbaToFloatConverter'); - return (r / 3 + g / 3 + b / 3) / 255; - } - - rgbaToUint8Converter(r: number, g: number, b: number, a: number): number { - if (this.customConverter.rgbaToUint8Converter) { - return this.customConverter.rgbaToUint8Converter(r, g, b, a); - } - this.logWarningOnce('rgbaToUint8Converter'); - return r / 3 + g / 3 + b / 3; - } - - floatToUint8Converter(v: number): number { - if (this.customConverter.floatToUint8Converter) { - return this.customConverter.floatToUint8Converter(v); - } - this.logWarningOnce('floatToUint8Converter'); - return v * 255; - } - - uint8ToFloatConverter(v: number): number { - if (this.customConverter.uint8ToFloatConverter) { - return this.customConverter.uint8ToFloatConverter(v); - } - this.logWarningOnce('uint8ToFloatConverter'); - return v / 255; - } - - private logWarningOnce(methodName: string): void { - if (!DefaultColorConverter.WARNINGS_LOGGED.has(methodName)) { - console.log(`Using default ${methodName}`); - DefaultColorConverter.WARNINGS_LOGGED.add(methodName); - } - } -} +export type MPImageContainer = ImageData|ImageBitmap|WebGLTexture; /** * The wrapper class for MediaPipe Image objects. * * Images are stored as `ImageData`, `ImageBitmap` or `WebGLTexture` objects. * You can convert the underlying type to any other type by passing the - * desired type to `get()`. As type conversions can be expensive, it is + * desired type to `getAs...()`. As type conversions can be expensive, it is * recommended to limit these conversions. You can verify what underlying - * types are already available by invoking `has()`. + * types are already available by invoking `has...()`. * * Images that are returned from a MediaPipe Tasks are owned by by the * underlying C++ Task. If you need to extend the lifetime of these objects, @@ -413,21 +48,10 @@ class DefaultColorConverter implements Required { * initialized with an `OffscreenCanvas`. As we require WebGL2 support, this * places some limitations on Browser support as outlined here: * https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas/getContext - * - * Some MediaPipe tasks return single channel masks. These masks are stored - * using an underlying `Uint8ClampedArray` an `Float32Array` (represented as - * single-channel arrays). To convert these type to other formats a conversion - * function is invoked to convert pixel values between single channel and four - * channel RGBA values. To customize this conversion, you can specify these - * conversion functions when you invoke `get()`. If you use the default - * conversion function a warning will be logged to the console. */ export class MPImage { private gl?: WebGL2RenderingContext; - /** The underlying type of the image. */ - static TYPE = MPImageType; - /** @hideconstructor */ constructor( private readonly containers: MPImageContainer[], @@ -442,113 +66,60 @@ export class MPImage { readonly height: number, ) {} - /** - * Returns whether this `MPImage` stores the image in the desired format. - * This method can be called to reduce expensive conversion before invoking - * `get()`. - */ - has(type: MPImageType): boolean { - return !!this.getContainer(type); + /** Returns whether this `MPImage` contains a mask of type `ImageData`. */ + hasImageData(): boolean { + return !!this.getContainer(MPImageType.IMAGE_DATA); + } + + /** Returns whether this `MPImage` contains a mask of type `ImageBitmap`. */ + hasImageBitmap(): boolean { + return !!this.getContainer(MPImageType.IMAGE_BITMAP); + } + + /** Returns whether this `MPImage` contains a mask of type `WebGLTexture`. */ + hasWebGLTexture(): boolean { + return !!this.getContainer(MPImageType.WEBGL_TEXTURE); } - /** - * Returns the underlying image as a single channel `Uint8ClampedArray`. Note - * that this involves an expensive GPU to CPU transfer if the current image is - * only available as an `ImageBitmap` or `WebGLTexture`. If necessary, this - * function converts RGBA data pixel-by-pixel to a single channel value by - * invoking a conversion function (see class comment for detail). - * - * @param type The type of image to return. - * @param converter A set of conversion functions that will be invoked to - * convert the underlying pixel data if necessary. You may omit this - * function if the requested conversion does not change the pixel format. - * @return The current data as a Uint8ClampedArray. - */ - get(type: MPImageType.UINT8_CLAMPED_ARRAY, - converter?: MPImageChannelConverter): Uint8ClampedArray; - /** - * Returns the underlying image as a single channel `Float32Array`. Note - * that this involves an expensive GPU to CPU transfer if the current image is - * only available as an `ImageBitmap` or `WebGLTexture`. If necessary, this - * function converts RGBA data pixel-by-pixel to a single channel value by - * invoking a conversion function (see class comment for detail). - * - * @param type The type of image to return. - * @param converter A set of conversion functions that will be invoked to - * convert the underlying pixel data if necessary. You may omit this - * function if the requested conversion does not change the pixel format. - * @return The current image as a Float32Array. - */ - get(type: MPImageType.FLOAT32_ARRAY, - converter?: MPImageChannelConverter): Float32Array; /** * Returns the underlying image as an `ImageData` object. Note that this * involves an expensive GPU to CPU transfer if the current image is only - * available as an `ImageBitmap` or `WebGLTexture`. If necessary, this - * function converts single channel pixel values to RGBA by invoking a - * conversion function (see class comment for detail). + * available as an `ImageBitmap` or `WebGLTexture`. * * @return The current image as an ImageData object. */ - get(type: MPImageType.IMAGE_DATA, - converter?: MPImageChannelConverter): ImageData; + getAsImageData(): ImageData { + return this.convertToImageData(); + } + /** * Returns the underlying image as an `ImageBitmap`. Note that * conversions to `ImageBitmap` are expensive, especially if the data - * currently resides on CPU. If necessary, this function first converts single - * channel pixel values to RGBA by invoking a conversion function (see class - * comment for detail). + * currently resides on CPU. * * Processing with `ImageBitmap`s requires that the MediaPipe Task was * initialized with an `OffscreenCanvas` with WebGL2 support. See * https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas/getContext * for a list of supported platforms. * - * @param type The type of image to return. - * @param converter A set of conversion functions that will be invoked to - * convert the underlying pixel data if necessary. You may omit this - * function if the requested conversion does not change the pixel format. * @return The current image as an ImageBitmap object. */ - get(type: MPImageType.IMAGE_BITMAP, - converter?: MPImageChannelConverter): ImageBitmap; + getAsImageBitmap(): ImageBitmap { + return this.convertToImageBitmap(); + } + /** * Returns the underlying image as a `WebGLTexture` object. Note that this * involves a CPU to GPU transfer if the current image is only available as * an `ImageData` object. The returned texture is bound to the current * canvas (see `.canvas`). * - * @param type The type of image to return. - * @param converter A set of conversion functions that will be invoked to - * convert the underlying pixel data if necessary. You may omit this - * function if the requested conversion does not change the pixel format. * @return The current image as a WebGLTexture. */ - get(type: MPImageType.WEBGL_TEXTURE, - converter?: MPImageChannelConverter): WebGLTexture; - get(type?: MPImageType, - converter?: MPImageChannelConverter): MPImageContainer { - const internalConverter = new DefaultColorConverter(converter ?? {}); - switch (type) { - case MPImageType.UINT8_CLAMPED_ARRAY: - return this.convertToUint8ClampedArray(internalConverter); - case MPImageType.FLOAT32_ARRAY: - return this.convertToFloat32Array(internalConverter); - case MPImageType.IMAGE_DATA: - return this.convertToImageData(internalConverter); - case MPImageType.IMAGE_BITMAP: - return this.convertToImageBitmap(internalConverter); - case MPImageType.WEBGL_TEXTURE: - return this.convertToWebGLTexture(internalConverter); - default: - throw new Error(`Type is not supported: ${type}`); - } + getAsWebGLTexture(): WebGLTexture { + return this.convertToWebGLTexture(); } - - private getContainer(type: MPImageType.UINT8_CLAMPED_ARRAY): Uint8ClampedArray - |undefined; - private getContainer(type: MPImageType.FLOAT32_ARRAY): Float32Array|undefined; private getContainer(type: MPImageType.IMAGE_DATA): ImageData|undefined; private getContainer(type: MPImageType.IMAGE_BITMAP): ImageBitmap|undefined; private getContainer(type: MPImageType.WEBGL_TEXTURE): WebGLTexture|undefined; @@ -556,16 +127,16 @@ export class MPImage { /** Returns the container for the requested storage type iff it exists. */ private getContainer(type: MPImageType): MPImageContainer|undefined { switch (type) { - case MPImageType.UINT8_CLAMPED_ARRAY: - return this.containers.find(img => img instanceof Uint8ClampedArray); - case MPImageType.FLOAT32_ARRAY: - return this.containers.find(img => img instanceof Float32Array); case MPImageType.IMAGE_DATA: return this.containers.find(img => img instanceof ImageData); case MPImageType.IMAGE_BITMAP: - return this.containers.find(img => img instanceof ImageBitmap); + return this.containers.find( + img => typeof ImageBitmap !== 'undefined' && + img instanceof ImageBitmap); case MPImageType.WEBGL_TEXTURE: - return this.containers.find(img => img instanceof WebGLTexture); + return this.containers.find( + img => typeof WebGLTexture !== 'undefined' && + img instanceof WebGLTexture); default: throw new Error(`Type is not supported: ${type}`); } @@ -586,11 +157,7 @@ export class MPImage { for (const container of this.containers) { let destinationContainer: MPImageContainer; - if (container instanceof Uint8ClampedArray) { - destinationContainer = new Uint8ClampedArray(container); - } else if (container instanceof Float32Array) { - destinationContainer = new Float32Array(container); - } else if (container instanceof ImageData) { + if (container instanceof ImageData) { destinationContainer = new ImageData(container.data, this.width, this.height); } else if (container instanceof WebGLTexture) { @@ -619,7 +186,7 @@ export class MPImage { this.unbindTexture(); } else if (container instanceof ImageBitmap) { - this.convertToWebGLTexture(new DefaultColorConverter({})); + this.convertToWebGLTexture(); this.bindTexture(); destinationContainer = this.copyTextureToBitmap(); this.unbindTexture(); @@ -631,9 +198,8 @@ export class MPImage { } return new MPImage( - destinationContainers, this.has(MPImageType.IMAGE_BITMAP), - this.has(MPImageType.WEBGL_TEXTURE), this.canvas, this.shaderContext, - this.width, this.height); + destinationContainers, this.hasImageBitmap(), this.hasWebGLTexture(), + this.canvas, this.shaderContext, this.width, this.height); } private getOffscreenCanvas(): OffscreenCanvas { @@ -667,11 +233,10 @@ export class MPImage { return this.shaderContext; } - private convertToImageBitmap(converter: Required): - ImageBitmap { + private convertToImageBitmap(): ImageBitmap { let imageBitmap = this.getContainer(MPImageType.IMAGE_BITMAP); if (!imageBitmap) { - this.convertToWebGLTexture(converter); + this.convertToWebGLTexture(); imageBitmap = this.convertWebGLTextureToImageBitmap(); this.containers.push(imageBitmap); this.ownsImageBitmap = true; @@ -680,115 +245,37 @@ export class MPImage { return imageBitmap; } - private convertToImageData(converter: Required): - ImageData { + private convertToImageData(): ImageData { let imageData = this.getContainer(MPImageType.IMAGE_DATA); if (!imageData) { - if (this.has(MPImageType.UINT8_CLAMPED_ARRAY)) { - const source = this.getContainer(MPImageType.UINT8_CLAMPED_ARRAY)!; - const destination = new Uint8ClampedArray(this.width * this.height * 4); - for (let i = 0; i < this.width * this.height; i++) { - const rgba = converter.uint8ToRGBAConverter(source[i]); - destination[i * 4] = rgba[0]; - destination[i * 4 + 1] = rgba[1]; - destination[i * 4 + 2] = rgba[2]; - destination[i * 4 + 3] = rgba[3]; - } - imageData = new ImageData(destination, this.width, this.height); - this.containers.push(imageData); - } else if (this.has(MPImageType.FLOAT32_ARRAY)) { - const source = this.getContainer(MPImageType.FLOAT32_ARRAY)!; - const destination = new Uint8ClampedArray(this.width * this.height * 4); - for (let i = 0; i < this.width * this.height; i++) { - const rgba = converter.floatToRGBAConverter(source[i]); - destination[i * 4] = rgba[0]; - destination[i * 4 + 1] = rgba[1]; - destination[i * 4 + 2] = rgba[2]; - destination[i * 4 + 3] = rgba[3]; - } - imageData = new ImageData(destination, this.width, this.height); - this.containers.push(imageData); - } else if ( - this.has(MPImageType.IMAGE_BITMAP) || - this.has(MPImageType.WEBGL_TEXTURE)) { - const gl = this.getGL(); - const shaderContext = this.getShaderContext(); - const pixels = new Uint8Array(this.width * this.height * 4); + const gl = this.getGL(); + const shaderContext = this.getShaderContext(); + const pixels = new Uint8Array(this.width * this.height * 4); - // Create texture if needed - const webGlTexture = this.convertToWebGLTexture(converter); + // Create texture if needed + const webGlTexture = this.convertToWebGLTexture(); - // Create a framebuffer from the texture and read back pixels - shaderContext.bindFramebuffer(gl, webGlTexture); - gl.readPixels( - 0, 0, this.width, this.height, gl.RGBA, gl.UNSIGNED_BYTE, pixels); - shaderContext.unbindFramebuffer(); + // Create a framebuffer from the texture and read back pixels + shaderContext.bindFramebuffer(gl, webGlTexture); + gl.readPixels( + 0, 0, this.width, this.height, gl.RGBA, gl.UNSIGNED_BYTE, pixels); + shaderContext.unbindFramebuffer(); - imageData = new ImageData( - new Uint8ClampedArray(pixels.buffer), this.width, this.height); - this.containers.push(imageData); - } else { - throw new Error('Couldn\t find backing image for ImageData conversion'); - } + imageData = new ImageData( + new Uint8ClampedArray(pixels.buffer), this.width, this.height); + this.containers.push(imageData); } return imageData; } - private convertToUint8ClampedArray( - converter: Required): Uint8ClampedArray { - let uint8ClampedArray = this.getContainer(MPImageType.UINT8_CLAMPED_ARRAY); - if (!uint8ClampedArray) { - if (this.has(MPImageType.FLOAT32_ARRAY)) { - const source = this.getContainer(MPImageType.FLOAT32_ARRAY)!; - uint8ClampedArray = new Uint8ClampedArray( - source.map(v => converter.floatToUint8Converter(v))); - } else { - const source = this.convertToImageData(converter).data; - uint8ClampedArray = new Uint8ClampedArray(this.width * this.height); - for (let i = 0; i < this.width * this.height; i++) { - uint8ClampedArray[i] = converter.rgbaToUint8Converter( - source[i * 4], source[i * 4 + 1], source[i * 4 + 2], - source[i * 4 + 3]); - } - } - this.containers.push(uint8ClampedArray); - } - - return uint8ClampedArray; - } - - private convertToFloat32Array(converter: Required): - Float32Array { - let float32Array = this.getContainer(MPImageType.FLOAT32_ARRAY); - if (!float32Array) { - if (this.has(MPImageType.UINT8_CLAMPED_ARRAY)) { - const source = this.getContainer(MPImageType.UINT8_CLAMPED_ARRAY)!; - float32Array = new Float32Array(source).map( - v => converter.uint8ToFloatConverter(v)); - } else { - const source = this.convertToImageData(converter).data; - float32Array = new Float32Array(this.width * this.height); - for (let i = 0; i < this.width * this.height; i++) { - float32Array[i] = converter.rgbaToFloatConverter( - source[i * 4], source[i * 4 + 1], source[i * 4 + 2], - source[i * 4 + 3]); - } - } - this.containers.push(float32Array); - } - - return float32Array; - } - - private convertToWebGLTexture(converter: Required): - WebGLTexture { + private convertToWebGLTexture(): WebGLTexture { let webGLTexture = this.getContainer(MPImageType.WEBGL_TEXTURE); if (!webGLTexture) { const gl = this.getGL(); webGLTexture = this.bindTexture(); const source = this.getContainer(MPImageType.IMAGE_BITMAP) || - this.convertToImageData(converter); + this.convertToImageData(); gl.texImage2D( gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, source); this.unbindTexture(); diff --git a/mediapipe/tasks/web/vision/core/image_shader_context.ts b/mediapipe/tasks/web/vision/core/image_shader_context.ts new file mode 100644 index 000000000..eb17d001a --- /dev/null +++ b/mediapipe/tasks/web/vision/core/image_shader_context.ts @@ -0,0 +1,243 @@ +/** + * Copyright 2023 The MediaPipe Authors. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +const VERTEX_SHADER = ` + attribute vec2 aVertex; + attribute vec2 aTex; + varying vec2 vTex; + void main(void) { + gl_Position = vec4(aVertex, 0.0, 1.0); + vTex = aTex; + }`; + +const FRAGMENT_SHADER = ` + precision mediump float; + varying vec2 vTex; + uniform sampler2D inputTexture; + void main() { + gl_FragColor = texture2D(inputTexture, vTex); + } + `; + +/** Helper to assert that `value` is not null. */ +export function assertNotNull(value: T|null, msg: string): T { + if (value === null) { + throw new Error(`Unable to obtain required WebGL resource: ${msg}`); + } + return value; +} + +/** + * Utility class that encapsulates the buffers used by `MPImageShaderContext`. + * For internal use only. + */ +class MPImageShaderBuffers { + constructor( + private readonly gl: WebGL2RenderingContext, + private readonly vertexArrayObject: WebGLVertexArrayObject, + private readonly vertexBuffer: WebGLBuffer, + private readonly textureBuffer: WebGLBuffer) {} + + bind() { + this.gl.bindVertexArray(this.vertexArrayObject); + } + + unbind() { + this.gl.bindVertexArray(null); + } + + close() { + this.gl.deleteVertexArray(this.vertexArrayObject); + this.gl.deleteBuffer(this.vertexBuffer); + this.gl.deleteBuffer(this.textureBuffer); + } +} + +/** + * A class that encapsulates the shaders used by an MPImage. Can be re-used + * across MPImages that use the same WebGL2Rendering context. + * + * For internal use only. + */ +export class MPImageShaderContext { + private gl?: WebGL2RenderingContext; + private framebuffer?: WebGLFramebuffer; + private program?: WebGLProgram; + private vertexShader?: WebGLShader; + private fragmentShader?: WebGLShader; + private aVertex?: GLint; + private aTex?: GLint; + + /** + * The shader buffers used for passthrough renders that don't modify the + * input texture. + */ + private shaderBuffersPassthrough?: MPImageShaderBuffers; + + /** + * The shader buffers used for passthrough renders that flip the input texture + * vertically before conversion to a different type. This is used to flip the + * texture to the expected orientation for drawing in the browser. + */ + private shaderBuffersFlipVertically?: MPImageShaderBuffers; + + private compileShader(source: string, type: number): WebGLShader { + const gl = this.gl!; + const shader = + assertNotNull(gl.createShader(type), 'Failed to create WebGL shader'); + gl.shaderSource(shader, source); + gl.compileShader(shader); + if (!gl.getShaderParameter(shader, gl.COMPILE_STATUS)) { + const info = gl.getShaderInfoLog(shader); + throw new Error(`Could not compile WebGL shader: ${info}`); + } + gl.attachShader(this.program!, shader); + return shader; + } + + private setupShaders(): void { + const gl = this.gl!; + this.program = + assertNotNull(gl.createProgram()!, 'Failed to create WebGL program'); + + this.vertexShader = this.compileShader(VERTEX_SHADER, gl.VERTEX_SHADER); + this.fragmentShader = + this.compileShader(FRAGMENT_SHADER, gl.FRAGMENT_SHADER); + + gl.linkProgram(this.program); + const linked = gl.getProgramParameter(this.program, gl.LINK_STATUS); + if (!linked) { + const info = gl.getProgramInfoLog(this.program); + throw new Error(`Error during program linking: ${info}`); + } + + this.aVertex = gl.getAttribLocation(this.program, 'aVertex'); + this.aTex = gl.getAttribLocation(this.program, 'aTex'); + } + + private createBuffers(flipVertically: boolean): MPImageShaderBuffers { + const gl = this.gl!; + const vertexArrayObject = + assertNotNull(gl.createVertexArray(), 'Failed to create vertex array'); + gl.bindVertexArray(vertexArrayObject); + + const vertexBuffer = + assertNotNull(gl.createBuffer(), 'Failed to create buffer'); + gl.bindBuffer(gl.ARRAY_BUFFER, vertexBuffer); + gl.enableVertexAttribArray(this.aVertex!); + gl.vertexAttribPointer(this.aVertex!, 2, gl.FLOAT, false, 0, 0); + gl.bufferData( + gl.ARRAY_BUFFER, new Float32Array([-1, -1, -1, 1, 1, 1, 1, -1]), + gl.STATIC_DRAW); + + const textureBuffer = + assertNotNull(gl.createBuffer(), 'Failed to create buffer'); + gl.bindBuffer(gl.ARRAY_BUFFER, textureBuffer); + gl.enableVertexAttribArray(this.aTex!); + gl.vertexAttribPointer(this.aTex!, 2, gl.FLOAT, false, 0, 0); + + const bufferData = + flipVertically ? [0, 1, 0, 0, 1, 0, 1, 1] : [0, 0, 0, 1, 1, 1, 1, 0]; + gl.bufferData( + gl.ARRAY_BUFFER, new Float32Array(bufferData), gl.STATIC_DRAW); + + gl.bindBuffer(gl.ARRAY_BUFFER, null); + gl.bindVertexArray(null); + + return new MPImageShaderBuffers( + gl, vertexArrayObject, vertexBuffer, textureBuffer); + } + + private getShaderBuffers(flipVertically: boolean): MPImageShaderBuffers { + if (flipVertically) { + if (!this.shaderBuffersFlipVertically) { + this.shaderBuffersFlipVertically = + this.createBuffers(/* flipVertically= */ true); + } + return this.shaderBuffersFlipVertically; + } else { + if (!this.shaderBuffersPassthrough) { + this.shaderBuffersPassthrough = + this.createBuffers(/* flipVertically= */ false); + } + return this.shaderBuffersPassthrough; + } + } + + private maybeInitGL(gl: WebGL2RenderingContext): void { + if (!this.gl) { + this.gl = gl; + } else if (gl !== this.gl) { + throw new Error('Cannot change GL context once initialized'); + } + } + + /** Runs the callback using the shader. */ + run( + gl: WebGL2RenderingContext, flipVertically: boolean, + callback: () => T): T { + this.maybeInitGL(gl); + + if (!this.program) { + this.setupShaders(); + } + + const shaderBuffers = this.getShaderBuffers(flipVertically); + gl.useProgram(this.program!); + shaderBuffers.bind(); + const result = callback(); + shaderBuffers.unbind(); + + return result; + } + + /** + * Binds a framebuffer to the canvas. If the framebuffer does not yet exist, + * creates it first. Binds the provided texture to the framebuffer. + */ + bindFramebuffer(gl: WebGL2RenderingContext, texture: WebGLTexture): void { + this.maybeInitGL(gl); + if (!this.framebuffer) { + this.framebuffer = + assertNotNull(gl.createFramebuffer(), 'Failed to create framebuffe.'); + } + gl.bindFramebuffer(gl.FRAMEBUFFER, this.framebuffer); + gl.framebufferTexture2D( + gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, texture, 0); + } + + unbindFramebuffer(): void { + this.gl?.bindFramebuffer(this.gl.FRAMEBUFFER, null); + } + + close() { + if (this.program) { + const gl = this.gl!; + gl.deleteProgram(this.program); + gl.deleteShader(this.vertexShader!); + gl.deleteShader(this.fragmentShader!); + } + if (this.framebuffer) { + this.gl!.deleteFramebuffer(this.framebuffer); + } + if (this.shaderBuffersPassthrough) { + this.shaderBuffersPassthrough.close(); + } + if (this.shaderBuffersFlipVertically) { + this.shaderBuffersFlipVertically.close(); + } + } +} diff --git a/mediapipe/tasks/web/vision/core/mask.test.ts b/mediapipe/tasks/web/vision/core/mask.test.ts new file mode 100644 index 000000000..b632f2dc5 --- /dev/null +++ b/mediapipe/tasks/web/vision/core/mask.test.ts @@ -0,0 +1,269 @@ +/** + * Copyright 2022 The MediaPipe Authors. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import 'jasmine'; + +import {MPImageShaderContext} from './image_shader_context'; +import {MPMask} from './mask'; + +const WIDTH = 2; +const HEIGHT = 2; + +const skip = typeof document === 'undefined'; +if (skip) { + console.log('These tests must be run in a browser.'); +} + +/** The mask types supported by MPMask. */ +type MaskType = Uint8Array|Float32Array|WebGLTexture; + +const MASK_2_1 = [1, 2]; +const MASK_2_2 = [1, 2, 3, 4]; +const MASK_2_3 = [1, 2, 3, 4, 5, 6]; + +/** The test images and data to use for the unit tests below. */ +class MPMaskTestContext { + canvas!: OffscreenCanvas; + gl!: WebGL2RenderingContext; + uint8Array!: Uint8Array; + float32Array!: Float32Array; + webGLTexture!: WebGLTexture; + + async init(pixels = MASK_2_2, width = WIDTH, height = HEIGHT): Promise { + // Initialize a canvas with default dimensions. Note that the canvas size + // can be different from the mask size. + this.canvas = new OffscreenCanvas(WIDTH, HEIGHT); + this.gl = this.canvas.getContext('webgl2') as WebGL2RenderingContext; + + const gl = this.gl; + if (!gl.getExtension('EXT_color_buffer_float')) { + throw new Error('Missing required EXT_color_buffer_float extension'); + } + + this.uint8Array = new Uint8Array(pixels); + this.float32Array = new Float32Array(pixels.length); + for (let i = 0; i < this.uint8Array.length; ++i) { + this.float32Array[i] = pixels[i] / 255; + } + + this.webGLTexture = gl.createTexture()!; + + gl.bindTexture(gl.TEXTURE_2D, this.webGLTexture); + gl.texImage2D( + gl.TEXTURE_2D, 0, gl.R32F, width, height, 0, gl.RED, gl.FLOAT, + new Float32Array(pixels).map(v => v / 255)); + gl.bindTexture(gl.TEXTURE_2D, null); + } + + get(type: unknown) { + switch (type) { + case Uint8Array: + return this.uint8Array; + case Float32Array: + return this.float32Array; + case WebGLTexture: + return this.webGLTexture; + default: + throw new Error(`Unsupported type: ${type}`); + } + } + + close(): void { + this.gl.deleteTexture(this.webGLTexture); + } +} + +(skip ? xdescribe : describe)('MPMask', () => { + const context = new MPMaskTestContext(); + + afterEach(() => { + context.close(); + }); + + function readPixelsFromWebGLTexture(texture: WebGLTexture): Float32Array { + const pixels = new Float32Array(WIDTH * HEIGHT); + + const gl = context.gl; + gl.bindTexture(gl.TEXTURE_2D, texture); + + const framebuffer = gl.createFramebuffer()!; + gl.bindFramebuffer(gl.FRAMEBUFFER, framebuffer); + gl.framebufferTexture2D( + gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, texture, 0); + gl.readPixels(0, 0, WIDTH, HEIGHT, gl.RED, gl.FLOAT, pixels); + gl.bindFramebuffer(gl.FRAMEBUFFER, null); + gl.deleteFramebuffer(framebuffer); + + gl.bindTexture(gl.TEXTURE_2D, null); + + // Sanity check values + expect(pixels[0]).not.toBe(0); + + return pixels; + } + + function assertEquality(mask: MPMask, expected: MaskType): void { + if (expected instanceof Uint8Array) { + const result = mask.getAsUint8Array(); + expect(result).toEqual(expected); + } else if (expected instanceof Float32Array) { + const result = mask.getAsFloat32Array(); + expect(result).toEqual(expected); + } else { // WebGLTexture + const result = mask.getAsWebGLTexture(); + expect(readPixelsFromWebGLTexture(result)) + .toEqual(readPixelsFromWebGLTexture(expected)); + } + } + + function createImage( + shaderContext: MPImageShaderContext, input: MaskType, width: number, + height: number): MPMask { + return new MPMask( + [input], + /* ownsWebGLTexture= */ false, context.canvas, shaderContext, width, + height); + } + + function runConversionTest( + input: MaskType, output: MaskType, width = WIDTH, height = HEIGHT): void { + const shaderContext = new MPImageShaderContext(); + const mask = createImage(shaderContext, input, width, height); + assertEquality(mask, output); + mask.close(); + shaderContext.close(); + } + + function runCloneTest(input: MaskType): void { + const shaderContext = new MPImageShaderContext(); + const mask = createImage(shaderContext, input, WIDTH, HEIGHT); + const clone = mask.clone(); + assertEquality(clone, input); + clone.close(); + shaderContext.close(); + } + + const sources = skip ? [] : [Uint8Array, Float32Array, WebGLTexture]; + + for (let i = 0; i < sources.length; i++) { + for (let j = 0; j < sources.length; j++) { + it(`converts from ${sources[i].name} to ${sources[j].name}`, async () => { + await context.init(); + runConversionTest(context.get(sources[i]), context.get(sources[j])); + }); + } + } + + for (let i = 0; i < sources.length; i++) { + it(`clones ${sources[i].name}`, async () => { + await context.init(); + runCloneTest(context.get(sources[i])); + }); + } + + it(`does not flip textures twice`, async () => { + await context.init(); + + const shaderContext = new MPImageShaderContext(); + const mask = new MPMask( + [context.webGLTexture], + /* ownsWebGLTexture= */ false, context.canvas, shaderContext, WIDTH, + HEIGHT); + + const result = mask.clone().getAsUint8Array(); + expect(result).toEqual(context.uint8Array); + shaderContext.close(); + }); + + it(`can clone and get mask`, async () => { + await context.init(); + + const shaderContext = new MPImageShaderContext(); + const mask = new MPMask( + [context.webGLTexture], + /* ownsWebGLTexture= */ false, context.canvas, shaderContext, WIDTH, + HEIGHT); + + // Verify that we can mix the different shader modes by running them out of + // order. + let result = mask.getAsUint8Array(); + expect(result).toEqual(context.uint8Array); + + result = mask.clone().getAsUint8Array(); + expect(result).toEqual(context.uint8Array); + + result = mask.getAsUint8Array(); + expect(result).toEqual(context.uint8Array); + + shaderContext.close(); + }); + + it('supports has()', async () => { + await context.init(); + + const shaderContext = new MPImageShaderContext(); + const mask = createImage(shaderContext, context.uint8Array, WIDTH, HEIGHT); + + expect(mask.hasUint8Array()).toBe(true); + expect(mask.hasFloat32Array()).toBe(false); + expect(mask.hasWebGLTexture()).toBe(false); + + mask.getAsFloat32Array(); + + expect(mask.hasUint8Array()).toBe(true); + expect(mask.hasFloat32Array()).toBe(true); + expect(mask.hasWebGLTexture()).toBe(false); + + mask.getAsWebGLTexture(); + + expect(mask.hasUint8Array()).toBe(true); + expect(mask.hasFloat32Array()).toBe(true); + expect(mask.hasWebGLTexture()).toBe(true); + + mask.close(); + shaderContext.close(); + }); + + it('supports mask that is smaller than the canvas', async () => { + await context.init(MASK_2_1, /* width= */ 2, /* height= */ 1); + + runConversionTest( + context.uint8Array, context.webGLTexture, /* width= */ 2, + /* height= */ 1); + runConversionTest( + context.webGLTexture, context.float32Array, /* width= */ 2, + /* height= */ 1); + runConversionTest( + context.float32Array, context.uint8Array, /* width= */ 2, + /* height= */ 1); + + context.close(); + }); + + it('supports mask that is larger than the canvas', async () => { + await context.init(MASK_2_3, /* width= */ 2, /* height= */ 3); + + runConversionTest( + context.uint8Array, context.webGLTexture, /* width= */ 2, + /* height= */ 3); + runConversionTest( + context.webGLTexture, context.float32Array, /* width= */ 2, + /* height= */ 3); + runConversionTest( + context.float32Array, context.uint8Array, /* width= */ 2, + /* height= */ 3); + }); +}); diff --git a/mediapipe/tasks/web/vision/core/mask.ts b/mediapipe/tasks/web/vision/core/mask.ts new file mode 100644 index 000000000..da14f104f --- /dev/null +++ b/mediapipe/tasks/web/vision/core/mask.ts @@ -0,0 +1,315 @@ +/** + * Copyright 2023 The MediaPipe Authors. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import {assertNotNull, MPImageShaderContext} from '../../../../tasks/web/vision/core/image_shader_context'; + +/** The underlying type of the image. */ +enum MPMaskType { + /** Represents the native `UInt8Array` type. */ + UINT8_ARRAY, + /** Represents the native `Float32Array` type. */ + FLOAT32_ARRAY, + /** Represents the native `WebGLTexture` type. */ + WEBGL_TEXTURE +} + +/** The supported mask formats. For internal usage. */ +export type MPMaskContainer = Uint8Array|Float32Array|WebGLTexture; + +/** + * The wrapper class for MediaPipe segmentation masks. + * + * Masks are stored as `Uint8Array`, `Float32Array` or `WebGLTexture` objects. + * You can convert the underlying type to any other type by passing the desired + * type to `getAs...()`. As type conversions can be expensive, it is recommended + * to limit these conversions. You can verify what underlying types are already + * available by invoking `has...()`. + * + * Masks that are returned from a MediaPipe Tasks are owned by by the + * underlying C++ Task. If you need to extend the lifetime of these objects, + * you can invoke the `clone()` method. To free up the resources obtained + * during any clone or type conversion operation, it is important to invoke + * `close()` on the `MPMask` instance. + */ +export class MPMask { + private gl?: WebGL2RenderingContext; + + /** @hideconstructor */ + constructor( + private readonly containers: MPMaskContainer[], + private ownsWebGLTexture: boolean, + /** Returns the canvas element that the mask is bound to. */ + readonly canvas: HTMLCanvasElement|OffscreenCanvas|undefined, + private shaderContext: MPImageShaderContext|undefined, + /** Returns the width of the mask. */ + readonly width: number, + /** Returns the height of the mask. */ + readonly height: number, + ) {} + + /** Returns whether this `MPMask` contains a mask of type `Uint8Array`. */ + hasUint8Array(): boolean { + return !!this.getContainer(MPMaskType.UINT8_ARRAY); + } + + /** Returns whether this `MPMask` contains a mask of type `Float32Array`. */ + hasFloat32Array(): boolean { + return !!this.getContainer(MPMaskType.FLOAT32_ARRAY); + } + + /** Returns whether this `MPMask` contains a mask of type `WebGLTexture`. */ + hasWebGLTexture(): boolean { + return !!this.getContainer(MPMaskType.WEBGL_TEXTURE); + } + + /** + * Returns the underlying mask as a Uint8Array`. Note that this involves an + * expensive GPU to CPU transfer if the current mask is only available as a + * `WebGLTexture`. + * + * @return The current data as a Uint8Array. + */ + getAsUint8Array(): Uint8Array { + return this.convertToUint8Array(); + } + + /** + * Returns the underlying mask as a single channel `Float32Array`. Note that + * this involves an expensive GPU to CPU transfer if the current mask is only + * available as a `WebGLTexture`. + * + * @return The current mask as a Float32Array. + */ + getAsFloat32Array(): Float32Array { + return this.convertToFloat32Array(); + } + + /** + * Returns the underlying mask as a `WebGLTexture` object. Note that this + * involves a CPU to GPU transfer if the current mask is only available as + * a CPU array. The returned texture is bound to the current canvas (see + * `.canvas`). + * + * @return The current mask as a WebGLTexture. + */ + getAsWebGLTexture(): WebGLTexture { + return this.convertToWebGLTexture(); + } + + private getContainer(type: MPMaskType.UINT8_ARRAY): Uint8Array|undefined; + private getContainer(type: MPMaskType.FLOAT32_ARRAY): Float32Array|undefined; + private getContainer(type: MPMaskType.WEBGL_TEXTURE): WebGLTexture|undefined; + private getContainer(type: MPMaskType): MPMaskContainer|undefined; + /** Returns the container for the requested storage type iff it exists. */ + private getContainer(type: MPMaskType): MPMaskContainer|undefined { + switch (type) { + case MPMaskType.UINT8_ARRAY: + return this.containers.find(img => img instanceof Uint8Array); + case MPMaskType.FLOAT32_ARRAY: + return this.containers.find(img => img instanceof Float32Array); + case MPMaskType.WEBGL_TEXTURE: + return this.containers.find( + img => typeof WebGLTexture !== 'undefined' && + img instanceof WebGLTexture); + default: + throw new Error(`Type is not supported: ${type}`); + } + } + + /** + * Creates a copy of the resources stored in this `MPMask`. You can + * invoke this method to extend the lifetime of a mask returned by a + * MediaPipe Task. Note that performance critical applications should aim to + * only use the `MPMask` within the MediaPipe Task callback so that + * copies can be avoided. + */ + clone(): MPMask { + const destinationContainers: MPMaskContainer[] = []; + + // TODO: We might only want to clone one backing datastructure + // even if multiple are defined; + for (const container of this.containers) { + let destinationContainer: MPMaskContainer; + + if (container instanceof Uint8Array) { + destinationContainer = new Uint8Array(container); + } else if (container instanceof Float32Array) { + destinationContainer = new Float32Array(container); + } else if (container instanceof WebGLTexture) { + const gl = this.getGL(); + const shaderContext = this.getShaderContext(); + + // Create a new texture and use it to back a framebuffer + gl.activeTexture(gl.TEXTURE1); + destinationContainer = + assertNotNull(gl.createTexture(), 'Failed to create texture'); + gl.bindTexture(gl.TEXTURE_2D, destinationContainer); + gl.texImage2D( + gl.TEXTURE_2D, 0, gl.R32F, this.width, this.height, 0, gl.RED, + gl.FLOAT, null); + gl.bindTexture(gl.TEXTURE_2D, null); + + shaderContext.bindFramebuffer(gl, destinationContainer); + shaderContext.run(gl, /* flipVertically= */ false, () => { + this.bindTexture(); // This activates gl.TEXTURE0 + gl.clearColor(0, 0, 0, 0); + gl.clear(gl.COLOR_BUFFER_BIT); + gl.drawArrays(gl.TRIANGLE_FAN, 0, 4); + this.unbindTexture(); + }); + shaderContext.unbindFramebuffer(); + + this.unbindTexture(); + } else { + throw new Error(`Type is not supported: ${container}`); + } + + destinationContainers.push(destinationContainer); + } + + return new MPMask( + destinationContainers, this.hasWebGLTexture(), this.canvas, + this.shaderContext, this.width, this.height); + } + + private getGL(): WebGL2RenderingContext { + if (!this.canvas) { + throw new Error( + 'Conversion to different image formats require that a canvas ' + + 'is passed when iniitializing the image.'); + } + if (!this.gl) { + this.gl = assertNotNull( + this.canvas.getContext('webgl2') as WebGL2RenderingContext | null, + 'You cannot use a canvas that is already bound to a different ' + + 'type of rendering context.'); + } + const ext = this.gl.getExtension('EXT_color_buffer_float'); + if (!ext) { + // TODO: Ensure this works on iOS + throw new Error('Missing required EXT_color_buffer_float extension'); + } + return this.gl; + } + + private getShaderContext(): MPImageShaderContext { + if (!this.shaderContext) { + this.shaderContext = new MPImageShaderContext(); + } + return this.shaderContext; + } + + private convertToFloat32Array(): Float32Array { + let float32Array = this.getContainer(MPMaskType.FLOAT32_ARRAY); + if (!float32Array) { + const uint8Array = this.getContainer(MPMaskType.UINT8_ARRAY); + if (uint8Array) { + float32Array = new Float32Array(uint8Array).map(v => v / 255); + } else { + const gl = this.getGL(); + const shaderContext = this.getShaderContext(); + float32Array = new Float32Array(this.width * this.height); + + // Create texture if needed + const webGlTexture = this.convertToWebGLTexture(); + + // Create a framebuffer from the texture and read back pixels + shaderContext.bindFramebuffer(gl, webGlTexture); + gl.readPixels( + 0, 0, this.width, this.height, gl.RED, gl.FLOAT, float32Array); + shaderContext.unbindFramebuffer(); + } + this.containers.push(float32Array); + } + + return float32Array; + } + + private convertToUint8Array(): Uint8Array { + let uint8Array = this.getContainer(MPMaskType.UINT8_ARRAY); + if (!uint8Array) { + const floatArray = this.convertToFloat32Array(); + uint8Array = new Uint8Array(floatArray.map(v => 255 * v)); + this.containers.push(uint8Array); + } + return uint8Array; + } + + private convertToWebGLTexture(): WebGLTexture { + let webGLTexture = this.getContainer(MPMaskType.WEBGL_TEXTURE); + if (!webGLTexture) { + const gl = this.getGL(); + webGLTexture = this.bindTexture(); + + const data = this.convertToFloat32Array(); + // TODO: Add support for R16F to support iOS + gl.texImage2D( + gl.TEXTURE_2D, 0, gl.R32F, this.width, this.height, 0, gl.RED, + gl.FLOAT, data); + this.unbindTexture(); + } + + return webGLTexture; + } + + /** + * Binds the backing texture to the canvas. If the texture does not yet + * exist, creates it first. + */ + private bindTexture(): WebGLTexture { + const gl = this.getGL(); + + gl.viewport(0, 0, this.width, this.height); + gl.activeTexture(gl.TEXTURE0); + + let webGLTexture = this.getContainer(MPMaskType.WEBGL_TEXTURE); + if (!webGLTexture) { + webGLTexture = + assertNotNull(gl.createTexture(), 'Failed to create texture'); + this.containers.push(webGLTexture); + this.ownsWebGLTexture = true; + } + + gl.bindTexture(gl.TEXTURE_2D, webGLTexture); + // TODO: Ideally, we would only set these once per texture and + // not once every frame. + gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE); + gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE); + gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST); + gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST); + + return webGLTexture; + } + + private unbindTexture(): void { + this.gl!.bindTexture(this.gl!.TEXTURE_2D, null); + } + + /** + * Frees up any resources owned by this `MPMask` instance. + * + * Note that this method does not free masks that are owned by the C++ + * Task, as these are freed automatically once you leave the MediaPipe + * callback. Additionally, some shared state is freed only once you invoke + * the Task's `close()` method. + */ + close(): void { + if (this.ownsWebGLTexture) { + const gl = this.getGL(); + gl.deleteTexture(this.getContainer(MPMaskType.WEBGL_TEXTURE)!); + } + } +} diff --git a/mediapipe/tasks/web/vision/core/render_utils.ts b/mediapipe/tasks/web/vision/core/render_utils.ts index 05f2a4df1..ebb3be16a 100644 --- a/mediapipe/tasks/web/vision/core/render_utils.ts +++ b/mediapipe/tasks/web/vision/core/render_utils.ts @@ -16,8 +16,6 @@ * limitations under the License. */ -import {MPImageChannelConverter} from '../../../../tasks/web/vision/core/image'; - // Pre-baked color table for a maximum of 12 classes. const CM_ALPHA = 128; const COLOR_MAP: Array<[number, number, number, number]> = [ @@ -35,8 +33,37 @@ const COLOR_MAP: Array<[number, number, number, number]> = [ [255, 255, 255, CM_ALPHA] // class 11 is white; could do black instead? ]; -/** The color converter we use in our demos. */ -export const RENDER_UTIL_CONVERTER: MPImageChannelConverter = { - floatToRGBAConverter: v => [128, 0, 0, v * 255], - uint8ToRGBAConverter: v => COLOR_MAP[v % COLOR_MAP.length], -}; + +/** Helper function to draw a confidence mask */ +export function drawConfidenceMask( + ctx: CanvasRenderingContext2D, image: Float32Array, width: number, + height: number): void { + const uint8Array = new Uint8ClampedArray(width * height * 4); + for (let i = 0; i < image.length; i++) { + uint8Array[4 * i] = 128; + uint8Array[4 * i + 1] = 0; + uint8Array[4 * i + 2] = 0; + uint8Array[4 * i + 3] = image[i] * 255; + } + ctx.putImageData(new ImageData(uint8Array, width, height), 0, 0); +} + +/** + * Helper function to draw a category mask. For GPU, we only have F32Arrays + * for now. + */ +export function drawCategoryMask( + ctx: CanvasRenderingContext2D, image: Uint8Array|Float32Array, + width: number, height: number): void { + const rgbaArray = new Uint8ClampedArray(width * height * 4); + const isFloatArray = image instanceof Float32Array; + for (let i = 0; i < image.length; i++) { + const colorIndex = isFloatArray ? Math.round(image[i] * 255) : image[i]; + const color = COLOR_MAP[colorIndex % COLOR_MAP.length]; + rgbaArray[4 * i] = color[0]; + rgbaArray[4 * i + 1] = color[1]; + rgbaArray[4 * i + 2] = color[2]; + rgbaArray[4 * i + 3] = color[3]; + } + ctx.putImageData(new ImageData(rgbaArray, width, height), 0, 0); +} diff --git a/mediapipe/tasks/web/vision/core/types.d.ts b/mediapipe/tasks/web/vision/core/types.d.ts index c985a9f36..64d67bc30 100644 --- a/mediapipe/tasks/web/vision/core/types.d.ts +++ b/mediapipe/tasks/web/vision/core/types.d.ts @@ -19,7 +19,10 @@ import {NormalizedKeypoint} from '../../../../tasks/web/components/containers/ke /** A Region-Of-Interest (ROI) to represent a region within an image. */ export declare interface RegionOfInterest { /** The ROI in keypoint format. */ - keypoint: NormalizedKeypoint; + keypoint?: NormalizedKeypoint; + + /** The ROI as scribbles over the object that the user wants to segment. */ + scribble?: NormalizedKeypoint[]; } /** A connection between two landmarks. */ diff --git a/mediapipe/tasks/web/vision/core/vision_task_runner.ts b/mediapipe/tasks/web/vision/core/vision_task_runner.ts index 285dbf900..f8f7826d0 100644 --- a/mediapipe/tasks/web/vision/core/vision_task_runner.ts +++ b/mediapipe/tasks/web/vision/core/vision_task_runner.ts @@ -17,8 +17,10 @@ import {NormalizedRect} from '../../../../framework/formats/rect_pb'; import {TaskRunner} from '../../../../tasks/web/core/task_runner'; import {WasmFileset} from '../../../../tasks/web/core/wasm_fileset'; -import {MPImage, MPImageShaderContext} from '../../../../tasks/web/vision/core/image'; +import {MPImage} from '../../../../tasks/web/vision/core/image'; import {ImageProcessingOptions} from '../../../../tasks/web/vision/core/image_processing_options'; +import {MPImageShaderContext} from '../../../../tasks/web/vision/core/image_shader_context'; +import {MPMask} from '../../../../tasks/web/vision/core/mask'; import {GraphRunner, ImageSource, WasmMediaPipeConstructor} from '../../../../web/graph_runner/graph_runner'; import {SupportImage, WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib'; import {isWebKit} from '../../../../web/graph_runner/platform_utils'; @@ -57,11 +59,6 @@ export abstract class VisionTaskRunner extends TaskRunner { protected static async createVisionInstance( type: WasmMediaPipeConstructor, fileset: WasmFileset, options: VisionTaskOptions): Promise { - if (options.baseOptions?.delegate === 'GPU') { - if (!options.canvas) { - throw new Error('You must specify a canvas for GPU processing.'); - } - } const canvas = options.canvas ?? createCanvas(); return TaskRunner.createInstance(type, canvas, fileset, options); } @@ -225,19 +222,18 @@ export abstract class VisionTaskRunner extends TaskRunner { /** * Converts a WasmImage to an MPImage. * - * Converts the underlying Uint8ClampedArray-backed images to ImageData + * Converts the underlying Uint8Array-backed images to ImageData * (adding an alpha channel if necessary), passes through WebGLTextures and * throws for Float32Array-backed images. */ - protected convertToMPImage(wasmImage: WasmImage): MPImage { + protected convertToMPImage(wasmImage: WasmImage, shouldCopyData: boolean): + MPImage { const {data, width, height} = wasmImage; const pixels = width * height; - let container: ImageData|WebGLTexture|Uint8ClampedArray; - if (data instanceof Uint8ClampedArray) { - if (data.length === pixels) { - container = data; // Mask - } else if (data.length === pixels * 3) { + let container: ImageData|WebGLTexture; + if (data instanceof Uint8Array) { + if (data.length === pixels * 3) { // TODO: Convert in C++ const rgba = new Uint8ClampedArray(pixels * 4); for (let i = 0; i < pixels; ++i) { @@ -247,25 +243,48 @@ export abstract class VisionTaskRunner extends TaskRunner { rgba[4 * i + 3] = 255; } container = new ImageData(rgba, width, height); - } else if (data.length ===pixels * 4) { - container = new ImageData(data, width, height); + } else if (data.length === pixels * 4) { + container = new ImageData( + new Uint8ClampedArray(data.buffer, data.byteOffset, data.length), + width, height); } else { throw new Error(`Unsupported channel count: ${data.length/pixels}`); } - } else if (data instanceof Float32Array) { + } else if (data instanceof WebGLTexture) { + container = data; + } else { + throw new Error(`Unsupported format: ${data.constructor.name}`); + } + + const image = new MPImage( + [container], /* ownsImageBitmap= */ false, + /* ownsWebGLTexture= */ false, this.graphRunner.wasmModule.canvas!, + this.shaderContext, width, height); + return shouldCopyData ? image.clone() : image; + } + + /** Converts a WasmImage to an MPMask. */ + protected convertToMPMask(wasmImage: WasmImage, shouldCopyData: boolean): + MPMask { + const {data, width, height} = wasmImage; + const pixels = width * height; + + let container: WebGLTexture|Uint8Array|Float32Array; + if (data instanceof Uint8Array || data instanceof Float32Array) { if (data.length === pixels) { - container = data; // Mask + container = data; } else { - throw new Error(`Unsupported channel count: ${data.length/pixels}`); + throw new Error(`Unsupported channel count: ${data.length / pixels}`); } - } else { // WebGLTexture + } else { container = data; } - return new MPImage( - [container], /* ownsImageBitmap= */ false, /* ownsWebGLTexture= */ false, - this.graphRunner.wasmModule.canvas!, this.shaderContext, width, - height); + const mask = new MPMask( + [container], + /* ownsWebGLTexture= */ false, this.graphRunner.wasmModule.canvas!, + this.shaderContext, width, height); + return shouldCopyData ? mask.clone() : mask; } /** Closes and cleans up the resources held by this task. */ diff --git a/mediapipe/tasks/web/vision/face_stylizer/BUILD b/mediapipe/tasks/web/vision/face_stylizer/BUILD index 0c0167dbd..fe9146987 100644 --- a/mediapipe/tasks/web/vision/face_stylizer/BUILD +++ b/mediapipe/tasks/web/vision/face_stylizer/BUILD @@ -47,7 +47,6 @@ mediapipe_ts_library( "//mediapipe/framework:calculator_jspb_proto", "//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core:task_runner_test_utils", - "//mediapipe/tasks/web/vision/core:image", "//mediapipe/web/graph_runner:graph_runner_image_lib_ts", ], ) diff --git a/mediapipe/tasks/web/vision/face_stylizer/face_stylizer.ts b/mediapipe/tasks/web/vision/face_stylizer/face_stylizer.ts index 2a9adb315..8169e6775 100644 --- a/mediapipe/tasks/web/vision/face_stylizer/face_stylizer.ts +++ b/mediapipe/tasks/web/vision/face_stylizer/face_stylizer.ts @@ -50,7 +50,8 @@ export type FaceStylizerCallback = (image: MPImage|null) => void; /** Performs face stylization on images. */ export class FaceStylizer extends VisionTaskRunner { - private userCallback: FaceStylizerCallback = () => {}; + private userCallback?: FaceStylizerCallback; + private result?: MPImage|null; private readonly options: FaceStylizerGraphOptionsProto; /** @@ -130,21 +131,58 @@ export class FaceStylizer extends VisionTaskRunner { return super.applyOptions(options); } - /** - * Performs face stylization on the provided single image. The method returns - * synchronously once the callback returns. Only use this method when the - * FaceStylizer is created with the image running mode. + * Performs face stylization on the provided single image and invokes the + * callback with result. The method returns synchronously once the callback + * returns. Only use this method when the FaceStylizer is created with the + * image running mode. * * @param image An image to process. - * @param callback The callback that is invoked with the stylized image. The - * lifetime of the returned data is only guaranteed for the duration of the - * callback. + * @param callback The callback that is invoked with the stylized image or + * `null` if no face was detected. The lifetime of the returned data is + * only guaranteed for the duration of the callback. */ stylize(image: ImageSource, callback: FaceStylizerCallback): void; /** - * Performs face stylization on the provided single image. The method returns - * synchronously once the callback returns. Only use this method when the + * Performs face stylization on the provided single image and invokes the + * callback with result. The method returns synchronously once the callback + * returns. Only use this method when the FaceStylizer is created with the + * image running mode. + * + * The 'imageProcessingOptions' parameter can be used to specify one or all + * of: + * - the rotation to apply to the image before performing stylization, by + * setting its 'rotationDegrees' property. + * - the region-of-interest on which to perform stylization, by setting its + * 'regionOfInterest' property. If not specified, the full image is used. + * If both are specified, the crop around the region-of-interest is extracted + * first, then the specified rotation is applied to the crop. + * + * @param image An image to process. + * @param imageProcessingOptions the `ImageProcessingOptions` specifying how + * to process the input image before running inference. + * @param callback The callback that is invoked with the stylized image or + * `null` if no face was detected. The lifetime of the returned data is + * only guaranteed for the duration of the callback. + */ + stylize( + image: ImageSource, imageProcessingOptions: ImageProcessingOptions, + callback: FaceStylizerCallback): void; + /** + * Performs face stylization on the provided single image and returns the + * result. This method creates a copy of the resulting image and should not be + * used in high-throughput applictions. Only use this method when the + * FaceStylizer is created with the image running mode. + * + * @param image An image to process. + * @return A stylized face or `null` if no face was detected. The result is + * copied to avoid lifetime issues. + */ + stylize(image: ImageSource): MPImage|null; + /** + * Performs face stylization on the provided single image and returns the + * result. This method creates a copy of the resulting image and should not be + * used in high-throughput applictions. Only use this method when the * FaceStylizer is created with the image running mode. * * The 'imageProcessingOptions' parameter can be used to specify one or all @@ -159,18 +197,16 @@ export class FaceStylizer extends VisionTaskRunner { * @param image An image to process. * @param imageProcessingOptions the `ImageProcessingOptions` specifying how * to process the input image before running inference. - * @param callback The callback that is invoked with the stylized image. The - * lifetime of the returned data is only guaranteed for the duration of the - * callback. + * @return A stylized face or `null` if no face was detected. The result is + * copied to avoid lifetime issues. */ - stylize( - image: ImageSource, imageProcessingOptions: ImageProcessingOptions, - callback: FaceStylizerCallback): void; + stylize(image: ImageSource, imageProcessingOptions: ImageProcessingOptions): + MPImage|null; stylize( image: ImageSource, - imageProcessingOptionsOrCallback: ImageProcessingOptions| + imageProcessingOptionsOrCallback?: ImageProcessingOptions| FaceStylizerCallback, - callback?: FaceStylizerCallback): void { + callback?: FaceStylizerCallback): MPImage|null|void { const imageProcessingOptions = typeof imageProcessingOptionsOrCallback !== 'function' ? imageProcessingOptionsOrCallback : @@ -178,14 +214,19 @@ export class FaceStylizer extends VisionTaskRunner { this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ? imageProcessingOptionsOrCallback : - callback!; + callback; this.processImageData(image, imageProcessingOptions ?? {}); - this.userCallback = () => {}; + + if (!this.userCallback) { + return this.result; + } } /** - * Performs face stylization on the provided video frame. Only use this method - * when the FaceStylizer is created with the video running mode. + * Performs face stylization on the provided video frame and invokes the + * callback with result. The method returns synchronously once the callback + * returns. Only use this method when the FaceStylizer is created with the + * video running mode. * * The input frame can be of any size. It's required to provide the video * frame's timestamp (in milliseconds). The input timestamps must be @@ -193,16 +234,18 @@ export class FaceStylizer extends VisionTaskRunner { * * @param videoFrame A video frame to process. * @param timestamp The timestamp of the current frame, in ms. - * @param callback The callback that is invoked with the stylized image. The - * lifetime of the returned data is only guaranteed for the duration of - * the callback. + * @param callback The callback that is invoked with the stylized image or + * `null` if no face was detected. The lifetime of the returned data is only + * guaranteed for the duration of the callback. */ stylizeForVideo( videoFrame: ImageSource, timestamp: number, callback: FaceStylizerCallback): void; /** - * Performs face stylization on the provided video frame. Only use this - * method when the FaceStylizer is created with the video running mode. + * Performs face stylization on the provided video frame and invokes the + * callback with result. The method returns synchronously once the callback + * returns. Only use this method when the FaceStylizer is created with the + * video running mode. * * The 'imageProcessingOptions' parameter can be used to specify one or all * of: @@ -218,34 +261,83 @@ export class FaceStylizer extends VisionTaskRunner { * monotonically increasing. * * @param videoFrame A video frame to process. + * @param timestamp The timestamp of the current frame, in ms. * @param imageProcessingOptions the `ImageProcessingOptions` specifying how * to process the input image before running inference. - * @param timestamp The timestamp of the current frame, in ms. - * @param callback The callback that is invoked with the stylized image. The - * lifetime of the returned data is only guaranteed for the duration of - * the callback. + * @param callback The callback that is invoked with the stylized image or + * `null` if no face was detected. The lifetime of the returned data is only + * guaranteed for the duration of the callback. */ stylizeForVideo( - videoFrame: ImageSource, imageProcessingOptions: ImageProcessingOptions, - timestamp: number, callback: FaceStylizerCallback): void; + videoFrame: ImageSource, timestamp: number, + imageProcessingOptions: ImageProcessingOptions, + callback: FaceStylizerCallback): void; + /** + * Performs face stylization on the provided video frame. This method creates + * a copy of the resulting image and should not be used in high-throughput + * applictions. Only use this method when the FaceStylizer is created with the + * video running mode. + * + * The input frame can be of any size. It's required to provide the video + * frame's timestamp (in milliseconds). The input timestamps must be + * monotonically increasing. + * + * @param videoFrame A video frame to process. + * @param timestamp The timestamp of the current frame, in ms. + * @return A stylized face or `null` if no face was detected. The result is + * copied to avoid lifetime issues. + */ + stylizeForVideo(videoFrame: ImageSource, timestamp: number): MPImage|null; + /** + * Performs face stylization on the provided video frame. This method creates + * a copy of the resulting image and should not be used in high-throughput + * applictions. Only use this method when the FaceStylizer is created with the + * video running mode. + * + * The 'imageProcessingOptions' parameter can be used to specify one or all + * of: + * - the rotation to apply to the image before performing stylization, by + * setting its 'rotationDegrees' property. + * - the region-of-interest on which to perform stylization, by setting its + * 'regionOfInterest' property. If not specified, the full image is used. + * If both are specified, the crop around the region-of-interest is + * extracted first, then the specified rotation is applied to the crop. + * + * The input frame can be of any size. It's required to provide the video + * frame's timestamp (in milliseconds). The input timestamps must be + * monotonically increasing. + * + * @param videoFrame A video frame to process. + * @param timestamp The timestamp of the current frame, in ms. + * @param imageProcessingOptions the `ImageProcessingOptions` specifying how + * to process the input image before running inference. + * @return A stylized face or `null` if no face was detected. The result is + * copied to avoid lifetime issues. + */ stylizeForVideo( videoFrame: ImageSource, - timestampOrImageProcessingOptions: number|ImageProcessingOptions, - timestampOrCallback: number|FaceStylizerCallback, - callback?: FaceStylizerCallback): void { + timestamp: number, + imageProcessingOptions: ImageProcessingOptions, + ): MPImage|null; + stylizeForVideo( + videoFrame: ImageSource, timestamp: number, + imageProcessingOptionsOrCallback?: ImageProcessingOptions| + FaceStylizerCallback, + callback?: FaceStylizerCallback): MPImage|null|void { const imageProcessingOptions = - typeof timestampOrImageProcessingOptions !== 'number' ? - timestampOrImageProcessingOptions : + typeof imageProcessingOptionsOrCallback !== 'function' ? + imageProcessingOptionsOrCallback : {}; - const timestamp = typeof timestampOrImageProcessingOptions === 'number' ? - timestampOrImageProcessingOptions : - timestampOrCallback as number; - this.userCallback = typeof timestampOrCallback === 'function' ? - timestampOrCallback : - callback!; + this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ? + imageProcessingOptionsOrCallback : + callback; this.processVideoData(videoFrame, imageProcessingOptions, timestamp); - this.userCallback = () => {}; + this.userCallback = undefined; + + if (!this.userCallback) { + return this.result; + } } /** Updates the MediaPipe graph configuration. */ @@ -270,13 +362,20 @@ export class FaceStylizer extends VisionTaskRunner { this.graphRunner.attachImageListener( STYLIZED_IMAGE_STREAM, (wasmImage, timestamp) => { - const mpImage = this.convertToMPImage(wasmImage); - this.userCallback(mpImage); + const mpImage = this.convertToMPImage( + wasmImage, /* shouldCopyData= */ !this.userCallback); + this.result = mpImage; + if (this.userCallback) { + this.userCallback(mpImage); + } this.setLatestOutputTimestamp(timestamp); }); this.graphRunner.attachEmptyPacketListener( STYLIZED_IMAGE_STREAM, timestamp => { - this.userCallback(null); + this.result = null; + if (this.userCallback) { + this.userCallback(null); + } this.setLatestOutputTimestamp(timestamp); }); diff --git a/mediapipe/tasks/web/vision/face_stylizer/face_stylizer_test.ts b/mediapipe/tasks/web/vision/face_stylizer/face_stylizer_test.ts index 17764c9e5..c092bf0f8 100644 --- a/mediapipe/tasks/web/vision/face_stylizer/face_stylizer_test.ts +++ b/mediapipe/tasks/web/vision/face_stylizer/face_stylizer_test.ts @@ -19,7 +19,6 @@ import 'jasmine'; // Placeholder for internal dependency on encodeByteArray import {CalculatorGraphConfig} from '../../../../framework/calculator_pb'; import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph, verifyListenersRegistered} from '../../../../tasks/web/core/task_runner_test_utils'; -import {MPImage} from '../../../../tasks/web/vision/core/image'; import {WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib'; import {FaceStylizer} from './face_stylizer'; @@ -99,6 +98,30 @@ describe('FaceStylizer', () => { ]); }); + it('returns result', () => { + if (typeof ImageData === 'undefined') { + console.log('ImageData tests are not supported on Node'); + return; + } + + // Pass the test data to our listener + faceStylizer.fakeWasmModule._waitUntilIdle.and.callFake(() => { + verifyListenersRegistered(faceStylizer); + faceStylizer.imageListener! + ({data: new Uint8Array([1, 1, 1, 1]), width: 1, height: 1}, + /* timestamp= */ 1337); + }); + + // Invoke the face stylizeer + const image = faceStylizer.stylize({} as HTMLImageElement); + expect(faceStylizer.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); + expect(image).not.toBeNull(); + expect(image!.hasImageData()).toBeTrue(); + expect(image!.width).toEqual(1); + expect(image!.height).toEqual(1); + image!.close(); + }); + it('invokes callback', (done) => { if (typeof ImageData === 'undefined') { console.log('ImageData tests are not supported on Node'); @@ -110,7 +133,7 @@ describe('FaceStylizer', () => { faceStylizer.fakeWasmModule._waitUntilIdle.and.callFake(() => { verifyListenersRegistered(faceStylizer); faceStylizer.imageListener! - ({data: new Uint8ClampedArray([1, 1, 1, 1]), width: 1, height: 1}, + ({data: new Uint8Array([1, 1, 1, 1]), width: 1, height: 1}, /* timestamp= */ 1337); }); @@ -118,35 +141,14 @@ describe('FaceStylizer', () => { faceStylizer.stylize({} as HTMLImageElement, image => { expect(faceStylizer.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); expect(image).not.toBeNull(); - expect(image!.has(MPImage.TYPE.IMAGE_DATA)).toBeTrue(); + expect(image!.hasImageData()).toBeTrue(); expect(image!.width).toEqual(1); expect(image!.height).toEqual(1); done(); }); }); - it('invokes callback even when no faes are detected', (done) => { - if (typeof ImageData === 'undefined') { - console.log('ImageData tests are not supported on Node'); - done(); - return; - } - - // Pass the test data to our listener - faceStylizer.fakeWasmModule._waitUntilIdle.and.callFake(() => { - verifyListenersRegistered(faceStylizer); - faceStylizer.emptyPacketListener!(/* timestamp= */ 1337); - }); - - // Invoke the face stylizeer - faceStylizer.stylize({} as HTMLImageElement, image => { - expect(faceStylizer.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); - expect(image).toBeNull(); - done(); - }); - }); - - it('invokes callback even when no faes are detected', (done) => { + it('invokes callback even when no faces are detected', (done) => { // Pass the test data to our listener faceStylizer.fakeWasmModule._waitUntilIdle.and.callFake(() => { verifyListenersRegistered(faceStylizer); diff --git a/mediapipe/tasks/web/vision/image_segmenter/BUILD b/mediapipe/tasks/web/vision/image_segmenter/BUILD index 6c1829bd3..1a008cc95 100644 --- a/mediapipe/tasks/web/vision/image_segmenter/BUILD +++ b/mediapipe/tasks/web/vision/image_segmenter/BUILD @@ -35,7 +35,7 @@ mediapipe_ts_declaration( deps = [ "//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core:classifier_options", - "//mediapipe/tasks/web/vision/core:image", + "//mediapipe/tasks/web/vision/core:mask", "//mediapipe/tasks/web/vision/core:vision_task_options", ], ) @@ -52,7 +52,7 @@ mediapipe_ts_library( "//mediapipe/framework:calculator_jspb_proto", "//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core:task_runner_test_utils", - "//mediapipe/tasks/web/vision/core:image", + "//mediapipe/tasks/web/vision/core:mask", "//mediapipe/web/graph_runner:graph_runner_image_lib_ts", ], ) diff --git a/mediapipe/tasks/web/vision/image_segmenter/image_segmenter.ts b/mediapipe/tasks/web/vision/image_segmenter/image_segmenter.ts index 60b965345..39e57d94e 100644 --- a/mediapipe/tasks/web/vision/image_segmenter/image_segmenter.ts +++ b/mediapipe/tasks/web/vision/image_segmenter/image_segmenter.ts @@ -60,7 +60,7 @@ export type ImageSegmenterCallback = (result: ImageSegmenterResult) => void; export class ImageSegmenter extends VisionTaskRunner { private result: ImageSegmenterResult = {}; private labels: string[] = []; - private userCallback: ImageSegmenterCallback = () => {}; + private userCallback?: ImageSegmenterCallback; private outputCategoryMask = DEFAULT_OUTPUT_CATEGORY_MASK; private outputConfidenceMasks = DEFAULT_OUTPUT_CONFIDENCE_MASKS; private readonly options: ImageSegmenterGraphOptionsProto; @@ -224,22 +224,51 @@ export class ImageSegmenter extends VisionTaskRunner { segment( image: ImageSource, imageProcessingOptions: ImageProcessingOptions, callback: ImageSegmenterCallback): void; + /** + * Performs image segmentation on the provided single image and returns the + * segmentation result. This method creates a copy of the resulting masks and + * should not be used in high-throughput applictions. Only use this method + * when the ImageSegmenter is created with running mode `image`. + * + * @param image An image to process. + * @return The segmentation result. The data is copied to avoid lifetime + * issues. + */ + segment(image: ImageSource): ImageSegmenterResult; + /** + * Performs image segmentation on the provided single image and returns the + * segmentation result. This method creates a copy of the resulting masks and + * should not be used in high-v applictions. Only use this method when + * the ImageSegmenter is created with running mode `image`. + * + * @param image An image to process. + * @param imageProcessingOptions the `ImageProcessingOptions` specifying how + * to process the input image before running inference. + * @return The segmentation result. The data is copied to avoid lifetime + * issues. + */ + segment(image: ImageSource, imageProcessingOptions: ImageProcessingOptions): + ImageSegmenterResult; segment( image: ImageSource, - imageProcessingOptionsOrCallback: ImageProcessingOptions| + imageProcessingOptionsOrCallback?: ImageProcessingOptions| ImageSegmenterCallback, - callback?: ImageSegmenterCallback): void { + callback?: ImageSegmenterCallback): ImageSegmenterResult|void { const imageProcessingOptions = typeof imageProcessingOptionsOrCallback !== 'function' ? imageProcessingOptionsOrCallback : {}; + this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ? imageProcessingOptionsOrCallback : - callback!; + callback; this.reset(); this.processImageData(image, imageProcessingOptions); - this.userCallback = () => {}; + + if (!this.userCallback) { + return this.result; + } } /** @@ -264,35 +293,64 @@ export class ImageSegmenter extends VisionTaskRunner { * created with running mode `video`. * * @param videoFrame A video frame to process. - * @param imageProcessingOptions the `ImageProcessingOptions` specifying how - * to process the input image before running inference. * @param timestamp The timestamp of the current frame, in ms. + * @param imageProcessingOptions the `ImageProcessingOptions` specifying how + * to process the input frame before running inference. * @param callback The callback that is invoked with the segmented masks. The * lifetime of the returned data is only guaranteed for the duration of the * callback. */ segmentForVideo( - videoFrame: ImageSource, imageProcessingOptions: ImageProcessingOptions, - timestamp: number, callback: ImageSegmenterCallback): void; + videoFrame: ImageSource, timestamp: number, + imageProcessingOptions: ImageProcessingOptions, + callback: ImageSegmenterCallback): void; + /** + * Performs image segmentation on the provided video frame and returns the + * segmentation result. This method creates a copy of the resulting masks and + * should not be used in high-throughput applictions. Only use this method + * when the ImageSegmenter is created with running mode `video`. + * + * @param videoFrame A video frame to process. + * @return The segmentation result. The data is copied to avoid lifetime + * issues. + */ + segmentForVideo(videoFrame: ImageSource, timestamp: number): + ImageSegmenterResult; + /** + * Performs image segmentation on the provided video frame and returns the + * segmentation result. This method creates a copy of the resulting masks and + * should not be used in high-v applictions. Only use this method when + * the ImageSegmenter is created with running mode `video`. + * + * @param videoFrame A video frame to process. + * @param timestamp The timestamp of the current frame, in ms. + * @param imageProcessingOptions the `ImageProcessingOptions` specifying how + * to process the input frame before running inference. + * @return The segmentation result. The data is copied to avoid lifetime + * issues. + */ segmentForVideo( - videoFrame: ImageSource, - timestampOrImageProcessingOptions: number|ImageProcessingOptions, - timestampOrCallback: number|ImageSegmenterCallback, - callback?: ImageSegmenterCallback): void { + videoFrame: ImageSource, timestamp: number, + imageProcessingOptions: ImageProcessingOptions): ImageSegmenterResult; + segmentForVideo( + videoFrame: ImageSource, timestamp: number, + imageProcessingOptionsOrCallback?: ImageProcessingOptions| + ImageSegmenterCallback, + callback?: ImageSegmenterCallback): ImageSegmenterResult|void { const imageProcessingOptions = - typeof timestampOrImageProcessingOptions !== 'number' ? - timestampOrImageProcessingOptions : + typeof imageProcessingOptionsOrCallback !== 'function' ? + imageProcessingOptionsOrCallback : {}; - const timestamp = typeof timestampOrImageProcessingOptions === 'number' ? - timestampOrImageProcessingOptions : - timestampOrCallback as number; - this.userCallback = typeof timestampOrCallback === 'function' ? - timestampOrCallback : - callback!; + this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ? + imageProcessingOptionsOrCallback : + callback; this.reset(); this.processVideoData(videoFrame, imageProcessingOptions, timestamp); - this.userCallback = () => {}; + + if (!this.userCallback) { + return this.result; + } } /** @@ -323,7 +381,9 @@ export class ImageSegmenter extends VisionTaskRunner { return; } - this.userCallback(this.result); + if (this.userCallback) { + this.userCallback(this.result); + } } /** Updates the MediaPipe graph configuration. */ @@ -351,8 +411,9 @@ export class ImageSegmenter extends VisionTaskRunner { this.graphRunner.attachImageVectorListener( CONFIDENCE_MASKS_STREAM, (masks, timestamp) => { - this.result.confidenceMasks = - masks.map(wasmImage => this.convertToMPImage(wasmImage)); + this.result.confidenceMasks = masks.map( + wasmImage => this.convertToMPMask( + wasmImage, /* shouldCopyData= */ !this.userCallback)); this.setLatestOutputTimestamp(timestamp); this.maybeInvokeCallback(); }); @@ -370,7 +431,8 @@ export class ImageSegmenter extends VisionTaskRunner { this.graphRunner.attachImageListener( CATEGORY_MASK_STREAM, (mask, timestamp) => { - this.result.categoryMask = this.convertToMPImage(mask); + this.result.categoryMask = this.convertToMPMask( + mask, /* shouldCopyData= */ !this.userCallback); this.setLatestOutputTimestamp(timestamp); this.maybeInvokeCallback(); }); diff --git a/mediapipe/tasks/web/vision/image_segmenter/image_segmenter_result.d.ts b/mediapipe/tasks/web/vision/image_segmenter/image_segmenter_result.d.ts index 454ec27ea..25962d57e 100644 --- a/mediapipe/tasks/web/vision/image_segmenter/image_segmenter_result.d.ts +++ b/mediapipe/tasks/web/vision/image_segmenter/image_segmenter_result.d.ts @@ -14,7 +14,7 @@ * limitations under the License. */ -import {MPImage} from '../../../../tasks/web/vision/core/image'; +import {MPMask} from '../../../../tasks/web/vision/core/mask'; /** The output result of ImageSegmenter. */ export declare interface ImageSegmenterResult { @@ -23,12 +23,12 @@ export declare interface ImageSegmenterResult { * `MPImage`s where, for each mask, each pixel represents the prediction * confidence, usually in the [0, 1] range. */ - confidenceMasks?: MPImage[]; + confidenceMasks?: MPMask[]; /** * A category mask represented as a `Uint8ClampedArray` or * `WebGLTexture`-backed `MPImage` where each pixel represents the class which * the pixel in the original image was predicted to belong to. */ - categoryMask?: MPImage; + categoryMask?: MPMask; } diff --git a/mediapipe/tasks/web/vision/image_segmenter/image_segmenter_test.ts b/mediapipe/tasks/web/vision/image_segmenter/image_segmenter_test.ts index c1ccd7997..f9172ecd3 100644 --- a/mediapipe/tasks/web/vision/image_segmenter/image_segmenter_test.ts +++ b/mediapipe/tasks/web/vision/image_segmenter/image_segmenter_test.ts @@ -19,8 +19,8 @@ import 'jasmine'; // Placeholder for internal dependency on encodeByteArray import {CalculatorGraphConfig} from '../../../../framework/calculator_pb'; import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph} from '../../../../tasks/web/core/task_runner_test_utils'; +import {MPMask} from '../../../../tasks/web/vision/core/mask'; import {WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib'; -import {MPImage} from '../../../../tasks/web/vision/core/image'; import {ImageSegmenter} from './image_segmenter'; import {ImageSegmenterOptions} from './image_segmenter_options'; @@ -165,7 +165,7 @@ describe('ImageSegmenter', () => { }); it('supports category mask', async () => { - const mask = new Uint8ClampedArray([1, 2, 3, 4]); + const mask = new Uint8Array([1, 2, 3, 4]); await imageSegmenter.setOptions( {outputCategoryMask: true, outputConfidenceMasks: false}); @@ -183,7 +183,7 @@ describe('ImageSegmenter', () => { return new Promise(resolve => { imageSegmenter.segment({} as HTMLImageElement, result => { expect(imageSegmenter.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); - expect(result.categoryMask).toBeInstanceOf(MPImage); + expect(result.categoryMask).toBeInstanceOf(MPMask); expect(result.confidenceMasks).not.toBeDefined(); expect(result.categoryMask!.width).toEqual(2); expect(result.categoryMask!.height).toEqual(2); @@ -216,18 +216,18 @@ describe('ImageSegmenter', () => { expect(imageSegmenter.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); expect(result.categoryMask).not.toBeDefined(); - expect(result.confidenceMasks![0]).toBeInstanceOf(MPImage); + expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask); expect(result.confidenceMasks![0].width).toEqual(2); expect(result.confidenceMasks![0].height).toEqual(2); - expect(result.confidenceMasks![1]).toBeInstanceOf(MPImage); + expect(result.confidenceMasks![1]).toBeInstanceOf(MPMask); resolve(); }); }); }); it('supports combined category and confidence masks', async () => { - const categoryMask = new Uint8ClampedArray([1]); + const categoryMask = new Uint8Array([1]); const confidenceMask1 = new Float32Array([0.0]); const confidenceMask2 = new Float32Array([1.0]); @@ -252,19 +252,19 @@ describe('ImageSegmenter', () => { // Invoke the image segmenter imageSegmenter.segment({} as HTMLImageElement, result => { expect(imageSegmenter.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); - expect(result.categoryMask).toBeInstanceOf(MPImage); + expect(result.categoryMask).toBeInstanceOf(MPMask); expect(result.categoryMask!.width).toEqual(1); expect(result.categoryMask!.height).toEqual(1); - expect(result.confidenceMasks![0]).toBeInstanceOf(MPImage); - expect(result.confidenceMasks![1]).toBeInstanceOf(MPImage); + expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask); + expect(result.confidenceMasks![1]).toBeInstanceOf(MPMask); resolve(); }); }); }); - it('invokes listener once masks are avaiblae', async () => { - const categoryMask = new Uint8ClampedArray([1]); + it('invokes listener once masks are available', async () => { + const categoryMask = new Uint8Array([1]); const confidenceMask = new Float32Array([0.0]); let listenerCalled = false; @@ -292,4 +292,21 @@ describe('ImageSegmenter', () => { }); }); }); + + it('returns result', () => { + const confidenceMask = new Float32Array([0.0]); + + // Pass the test data to our listener + imageSegmenter.fakeWasmModule._waitUntilIdle.and.callFake(() => { + imageSegmenter.confidenceMasksListener!( + [ + {data: confidenceMask, width: 1, height: 1}, + ], + 1337); + }); + + const result = imageSegmenter.segment({} as HTMLImageElement); + expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask); + result.confidenceMasks![0].close(); + }); }); diff --git a/mediapipe/tasks/web/vision/index.ts b/mediapipe/tasks/web/vision/index.ts index 34c1206cc..5b643b84e 100644 --- a/mediapipe/tasks/web/vision/index.ts +++ b/mediapipe/tasks/web/vision/index.ts @@ -16,7 +16,8 @@ import {FilesetResolver as FilesetResolverImpl} from '../../../tasks/web/core/fileset_resolver'; import {DrawingUtils as DrawingUtilsImpl} from '../../../tasks/web/vision/core/drawing_utils'; -import {MPImage as MPImageImpl, MPImageType as MPImageTypeImpl} from '../../../tasks/web/vision/core/image'; +import {MPImage as MPImageImpl} from '../../../tasks/web/vision/core/image'; +import {MPMask as MPMaskImpl} from '../../../tasks/web/vision/core/mask'; import {FaceDetector as FaceDetectorImpl} from '../../../tasks/web/vision/face_detector/face_detector'; import {FaceLandmarker as FaceLandmarkerImpl, FaceLandmarksConnections as FaceLandmarksConnectionsImpl} from '../../../tasks/web/vision/face_landmarker/face_landmarker'; import {FaceStylizer as FaceStylizerImpl} from '../../../tasks/web/vision/face_stylizer/face_stylizer'; @@ -34,7 +35,7 @@ import {PoseLandmarker as PoseLandmarkerImpl} from '../../../tasks/web/vision/po const DrawingUtils = DrawingUtilsImpl; const FilesetResolver = FilesetResolverImpl; const MPImage = MPImageImpl; -const MPImageType = MPImageTypeImpl; +const MPMask = MPMaskImpl; const FaceDetector = FaceDetectorImpl; const FaceLandmarker = FaceLandmarkerImpl; const FaceLandmarksConnections = FaceLandmarksConnectionsImpl; @@ -52,7 +53,7 @@ export { DrawingUtils, FilesetResolver, MPImage, - MPImageType, + MPMask, FaceDetector, FaceLandmarker, FaceLandmarksConnections, diff --git a/mediapipe/tasks/web/vision/interactive_segmenter/BUILD b/mediapipe/tasks/web/vision/interactive_segmenter/BUILD index c3be79ebf..57b0946a2 100644 --- a/mediapipe/tasks/web/vision/interactive_segmenter/BUILD +++ b/mediapipe/tasks/web/vision/interactive_segmenter/BUILD @@ -37,7 +37,7 @@ mediapipe_ts_declaration( deps = [ "//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core:classifier_options", - "//mediapipe/tasks/web/vision/core:image", + "//mediapipe/tasks/web/vision/core:mask", "//mediapipe/tasks/web/vision/core:vision_task_options", ], ) @@ -54,7 +54,7 @@ mediapipe_ts_library( "//mediapipe/framework:calculator_jspb_proto", "//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core:task_runner_test_utils", - "//mediapipe/tasks/web/vision/core:image", + "//mediapipe/tasks/web/vision/core:mask", "//mediapipe/util:render_data_jspb_proto", "//mediapipe/web/graph_runner:graph_runner_image_lib_ts", ], diff --git a/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter.ts b/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter.ts index 67d6ec3f6..2a51a5fcf 100644 --- a/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter.ts +++ b/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter.ts @@ -86,7 +86,7 @@ export class InteractiveSegmenter extends VisionTaskRunner { private result: InteractiveSegmenterResult = {}; private outputCategoryMask = DEFAULT_OUTPUT_CATEGORY_MASK; private outputConfidenceMasks = DEFAULT_OUTPUT_CONFIDENCE_MASKS; - private userCallback: InteractiveSegmenterCallback = () => {}; + private userCallback?: InteractiveSegmenterCallback; private readonly options: ImageSegmenterGraphOptionsProto; private readonly segmenterOptions: SegmenterOptionsProto; @@ -186,14 +186,9 @@ export class InteractiveSegmenter extends VisionTaskRunner { /** * Performs interactive segmentation on the provided single image and invokes - * the callback with the response. The `roi` parameter is used to represent a - * user's region of interest for segmentation. - * - * If the output_type is `CATEGORY_MASK`, the callback is invoked with vector - * of images that represent per-category segmented image mask. If the - * output_type is `CONFIDENCE_MASK`, the callback is invoked with a vector of - * images that contains only one confidence image mask. The method returns - * synchronously once the callback returns. + * the callback with the response. The method returns synchronously once the + * callback returns. The `roi` parameter is used to represent a user's region + * of interest for segmentation. * * @param image An image to process. * @param roi The region of interest for segmentation. @@ -206,8 +201,9 @@ export class InteractiveSegmenter extends VisionTaskRunner { callback: InteractiveSegmenterCallback): void; /** * Performs interactive segmentation on the provided single image and invokes - * the callback with the response. The `roi` parameter is used to represent a - * user's region of interest for segmentation. + * the callback with the response. The method returns synchronously once the + * callback returns. The `roi` parameter is used to represent a user's region + * of interest for segmentation. * * The 'image_processing_options' parameter can be used to specify the * rotation to apply to the image before performing segmentation, by setting @@ -215,12 +211,6 @@ export class InteractiveSegmenter extends VisionTaskRunner { * using the 'regionOfInterest' field is NOT supported and will result in an * error. * - * If the output_type is `CATEGORY_MASK`, the callback is invoked with vector - * of images that represent per-category segmented image mask. If the - * output_type is `CONFIDENCE_MASK`, the callback is invoked with a vector of - * images that contains only one confidence image mask. The method returns - * synchronously once the callback returns. - * * @param image An image to process. * @param roi The region of interest for segmentation. * @param imageProcessingOptions the `ImageProcessingOptions` specifying how @@ -233,23 +223,63 @@ export class InteractiveSegmenter extends VisionTaskRunner { image: ImageSource, roi: RegionOfInterest, imageProcessingOptions: ImageProcessingOptions, callback: InteractiveSegmenterCallback): void; + /** + * Performs interactive segmentation on the provided video frame and returns + * the segmentation result. This method creates a copy of the resulting masks + * and should not be used in high-throughput applictions. The `roi` parameter + * is used to represent a user's region of interest for segmentation. + * + * @param image An image to process. + * @param roi The region of interest for segmentation. + * @return The segmentation result. The data is copied to avoid lifetime + * limits. + */ + segment(image: ImageSource, roi: RegionOfInterest): + InteractiveSegmenterResult; + /** + * Performs interactive segmentation on the provided video frame and returns + * the segmentation result. This method creates a copy of the resulting masks + * and should not be used in high-throughput applictions. The `roi` parameter + * is used to represent a user's region of interest for segmentation. + * + * The 'image_processing_options' parameter can be used to specify the + * rotation to apply to the image before performing segmentation, by setting + * its 'rotationDegrees' field. Note that specifying a region-of-interest + * using the 'regionOfInterest' field is NOT supported and will result in an + * error. + * + * @param image An image to process. + * @param roi The region of interest for segmentation. + * @param imageProcessingOptions the `ImageProcessingOptions` specifying how + * to process the input image before running inference. + * @return The segmentation result. The data is copied to avoid lifetime + * limits. + */ segment( image: ImageSource, roi: RegionOfInterest, - imageProcessingOptionsOrCallback: ImageProcessingOptions| + imageProcessingOptions: ImageProcessingOptions): + InteractiveSegmenterResult; + segment( + image: ImageSource, roi: RegionOfInterest, + imageProcessingOptionsOrCallback?: ImageProcessingOptions| InteractiveSegmenterCallback, - callback?: InteractiveSegmenterCallback): void { + callback?: InteractiveSegmenterCallback): InteractiveSegmenterResult| + void { const imageProcessingOptions = typeof imageProcessingOptionsOrCallback !== 'function' ? imageProcessingOptionsOrCallback : {}; this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ? imageProcessingOptionsOrCallback : - callback!; + callback; this.reset(); this.processRenderData(roi, this.getSynctheticTimestamp()); this.processImageData(image, imageProcessingOptions); - this.userCallback = () => {}; + + if (!this.userCallback) { + return this.result; + } } private reset(): void { @@ -265,7 +295,9 @@ export class InteractiveSegmenter extends VisionTaskRunner { return; } - this.userCallback(this.result); + if (this.userCallback) { + this.userCallback(this.result); + } } /** Updates the MediaPipe graph configuration. */ @@ -295,8 +327,9 @@ export class InteractiveSegmenter extends VisionTaskRunner { this.graphRunner.attachImageVectorListener( CONFIDENCE_MASKS_STREAM, (masks, timestamp) => { - this.result.confidenceMasks = - masks.map(wasmImage => this.convertToMPImage(wasmImage)); + this.result.confidenceMasks = masks.map( + wasmImage => this.convertToMPMask( + wasmImage, /* shouldCopyData= */ !this.userCallback)); this.setLatestOutputTimestamp(timestamp); this.maybeInvokeCallback(); }); @@ -314,7 +347,8 @@ export class InteractiveSegmenter extends VisionTaskRunner { this.graphRunner.attachImageListener( CATEGORY_MASK_STREAM, (mask, timestamp) => { - this.result.categoryMask = this.convertToMPImage(mask); + this.result.categoryMask = this.convertToMPMask( + mask, /* shouldCopyData= */ !this.userCallback); this.setLatestOutputTimestamp(timestamp); this.maybeInvokeCallback(); }); @@ -338,16 +372,31 @@ export class InteractiveSegmenter extends VisionTaskRunner { const renderData = new RenderDataProto(); const renderAnnotation = new RenderAnnotationProto(); - const color = new ColorProto(); color.setR(255); renderAnnotation.setColor(color); - const point = new RenderAnnotationProto.Point(); - point.setNormalized(true); - point.setX(roi.keypoint.x); - point.setY(roi.keypoint.y); - renderAnnotation.setPoint(point); + if (roi.keypoint && roi.scribble) { + throw new Error('Cannot provide both keypoint and scribble.'); + } else if (roi.keypoint) { + const point = new RenderAnnotationProto.Point(); + point.setNormalized(true); + point.setX(roi.keypoint.x); + point.setY(roi.keypoint.y); + renderAnnotation.setPoint(point); + } else if (roi.scribble) { + const scribble = new RenderAnnotationProto.Scribble(); + for (const coord of roi.scribble) { + const point = new RenderAnnotationProto.Point(); + point.setNormalized(true); + point.setX(coord.x); + point.setY(coord.y); + scribble.addPoint(point); + } + renderAnnotation.setScribble(scribble); + } else { + throw new Error('Must provide either a keypoint or a scribble.'); + } renderData.addRenderAnnotations(renderAnnotation); diff --git a/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter_result.d.ts b/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter_result.d.ts index bc2962936..e773b5e64 100644 --- a/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter_result.d.ts +++ b/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter_result.d.ts @@ -14,7 +14,7 @@ * limitations under the License. */ -import {MPImage} from '../../../../tasks/web/vision/core/image'; +import {MPMask} from '../../../../tasks/web/vision/core/mask'; /** The output result of InteractiveSegmenter. */ export declare interface InteractiveSegmenterResult { @@ -23,12 +23,12 @@ export declare interface InteractiveSegmenterResult { * `MPImage`s where, for each mask, each pixel represents the prediction * confidence, usually in the [0, 1] range. */ - confidenceMasks?: MPImage[]; + confidenceMasks?: MPMask[]; /** * A category mask represented as a `Uint8ClampedArray` or * `WebGLTexture`-backed `MPImage` where each pixel represents the class which * the pixel in the original image was predicted to belong to. */ - categoryMask?: MPImage; + categoryMask?: MPMask; } diff --git a/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter_test.ts b/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter_test.ts index 84ecde00b..c5603c5c6 100644 --- a/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter_test.ts +++ b/mediapipe/tasks/web/vision/interactive_segmenter/interactive_segmenter_test.ts @@ -19,17 +19,21 @@ import 'jasmine'; // Placeholder for internal dependency on encodeByteArray import {CalculatorGraphConfig} from '../../../../framework/calculator_pb'; import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph} from '../../../../tasks/web/core/task_runner_test_utils'; -import {MPImage} from '../../../../tasks/web/vision/core/image'; +import {MPMask} from '../../../../tasks/web/vision/core/mask'; import {RenderData as RenderDataProto} from '../../../../util/render_data_pb'; import {WasmImage} from '../../../../web/graph_runner/graph_runner_image_lib'; import {InteractiveSegmenter, RegionOfInterest} from './interactive_segmenter'; -const ROI: RegionOfInterest = { +const KEYPOINT: RegionOfInterest = { keypoint: {x: 0.1, y: 0.2} }; +const SCRIBBLE: RegionOfInterest = { + scribble: [{x: 0.1, y: 0.2}, {x: 0.3, y: 0.4}] +}; + class InteractiveSegmenterFake extends InteractiveSegmenter implements MediapipeTasksFake { calculatorName = @@ -134,26 +138,46 @@ describe('InteractiveSegmenter', () => { it('doesn\'t support region of interest', () => { expect(() => { interactiveSegmenter.segment( - {} as HTMLImageElement, ROI, + {} as HTMLImageElement, KEYPOINT, {regionOfInterest: {left: 0, right: 0, top: 0, bottom: 0}}, () => {}); }).toThrowError('This task doesn\'t support region-of-interest.'); }); - it('sends region-of-interest', (done) => { + it('sends region-of-interest with keypoint', (done) => { interactiveSegmenter.fakeWasmModule._waitUntilIdle.and.callFake(() => { expect(interactiveSegmenter.lastRoi).toBeDefined(); expect(interactiveSegmenter.lastRoi!.toObject().renderAnnotationsList![0]) .toEqual(jasmine.objectContaining({ color: {r: 255, b: undefined, g: undefined}, + point: {x: 0.1, y: 0.2, normalized: true}, })); done(); }); - interactiveSegmenter.segment({} as HTMLImageElement, ROI, () => {}); + interactiveSegmenter.segment({} as HTMLImageElement, KEYPOINT, () => {}); + }); + + it('sends region-of-interest with scribble', (done) => { + interactiveSegmenter.fakeWasmModule._waitUntilIdle.and.callFake(() => { + expect(interactiveSegmenter.lastRoi).toBeDefined(); + expect(interactiveSegmenter.lastRoi!.toObject().renderAnnotationsList![0]) + .toEqual(jasmine.objectContaining({ + color: {r: 255, b: undefined, g: undefined}, + scribble: { + pointList: [ + {x: 0.1, y: 0.2, normalized: true}, + {x: 0.3, y: 0.4, normalized: true} + ] + }, + })); + done(); + }); + + interactiveSegmenter.segment({} as HTMLImageElement, SCRIBBLE, () => {}); }); it('supports category mask', async () => { - const mask = new Uint8ClampedArray([1, 2, 3, 4]); + const mask = new Uint8Array([1, 2, 3, 4]); await interactiveSegmenter.setOptions( {outputCategoryMask: true, outputConfidenceMasks: false}); @@ -168,10 +192,10 @@ describe('InteractiveSegmenter', () => { // Invoke the image segmenter return new Promise(resolve => { - interactiveSegmenter.segment({} as HTMLImageElement, ROI, result => { + interactiveSegmenter.segment({} as HTMLImageElement, KEYPOINT, result => { expect(interactiveSegmenter.fakeWasmModule._waitUntilIdle) .toHaveBeenCalled(); - expect(result.categoryMask).toBeInstanceOf(MPImage); + expect(result.categoryMask).toBeInstanceOf(MPMask); expect(result.categoryMask!.width).toEqual(2); expect(result.categoryMask!.height).toEqual(2); expect(result.confidenceMasks).not.toBeDefined(); @@ -199,23 +223,23 @@ describe('InteractiveSegmenter', () => { }); return new Promise(resolve => { // Invoke the image segmenter - interactiveSegmenter.segment({} as HTMLImageElement, ROI, result => { + interactiveSegmenter.segment({} as HTMLImageElement, KEYPOINT, result => { expect(interactiveSegmenter.fakeWasmModule._waitUntilIdle) .toHaveBeenCalled(); expect(result.categoryMask).not.toBeDefined(); - expect(result.confidenceMasks![0]).toBeInstanceOf(MPImage); + expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask); expect(result.confidenceMasks![0].width).toEqual(2); expect(result.confidenceMasks![0].height).toEqual(2); - expect(result.confidenceMasks![1]).toBeInstanceOf(MPImage); + expect(result.confidenceMasks![1]).toBeInstanceOf(MPMask); resolve(); }); }); }); it('supports combined category and confidence masks', async () => { - const categoryMask = new Uint8ClampedArray([1]); + const categoryMask = new Uint8Array([1]); const confidenceMask1 = new Float32Array([0.0]); const confidenceMask2 = new Float32Array([1.0]); @@ -239,22 +263,22 @@ describe('InteractiveSegmenter', () => { return new Promise(resolve => { // Invoke the image segmenter interactiveSegmenter.segment( - {} as HTMLImageElement, ROI, result => { + {} as HTMLImageElement, KEYPOINT, result => { expect(interactiveSegmenter.fakeWasmModule._waitUntilIdle) .toHaveBeenCalled(); - expect(result.categoryMask).toBeInstanceOf(MPImage); + expect(result.categoryMask).toBeInstanceOf(MPMask); expect(result.categoryMask!.width).toEqual(1); expect(result.categoryMask!.height).toEqual(1); - expect(result.confidenceMasks![0]).toBeInstanceOf(MPImage); - expect(result.confidenceMasks![1]).toBeInstanceOf(MPImage); + expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask); + expect(result.confidenceMasks![1]).toBeInstanceOf(MPMask); resolve(); }); }); }); it('invokes listener once masks are avaiblae', async () => { - const categoryMask = new Uint8ClampedArray([1]); + const categoryMask = new Uint8Array([1]); const confidenceMask = new Float32Array([0.0]); let listenerCalled = false; @@ -276,10 +300,28 @@ describe('InteractiveSegmenter', () => { }); return new Promise(resolve => { - interactiveSegmenter.segment({} as HTMLImageElement, ROI, () => { + interactiveSegmenter.segment({} as HTMLImageElement, KEYPOINT, () => { listenerCalled = true; resolve(); }); }); }); + + it('returns result', () => { + const confidenceMask = new Float32Array([0.0]); + + // Pass the test data to our listener + interactiveSegmenter.fakeWasmModule._waitUntilIdle.and.callFake(() => { + interactiveSegmenter.confidenceMasksListener!( + [ + {data: confidenceMask, width: 1, height: 1}, + ], + 1337); + }); + + const result = + interactiveSegmenter.segment({} as HTMLImageElement, KEYPOINT); + expect(result.confidenceMasks![0]).toBeInstanceOf(MPMask); + result.confidenceMasks![0].close(); + }); }); diff --git a/mediapipe/tasks/web/vision/pose_landmarker/BUILD b/mediapipe/tasks/web/vision/pose_landmarker/BUILD index 8d128ac1a..566513b40 100644 --- a/mediapipe/tasks/web/vision/pose_landmarker/BUILD +++ b/mediapipe/tasks/web/vision/pose_landmarker/BUILD @@ -45,7 +45,7 @@ mediapipe_ts_declaration( "//mediapipe/tasks/web/components/containers:category", "//mediapipe/tasks/web/components/containers:landmark", "//mediapipe/tasks/web/core", - "//mediapipe/tasks/web/vision/core:image", + "//mediapipe/tasks/web/vision/core:mask", "//mediapipe/tasks/web/vision/core:vision_task_options", ], ) @@ -63,7 +63,7 @@ mediapipe_ts_library( "//mediapipe/tasks/web/components/processors:landmark_result", "//mediapipe/tasks/web/core", "//mediapipe/tasks/web/core:task_runner_test_utils", - "//mediapipe/tasks/web/vision/core:image", + "//mediapipe/tasks/web/vision/core:mask", "//mediapipe/tasks/web/vision/core:vision_task_runner", ], ) diff --git a/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker.ts b/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker.ts index 2d72bf1dc..87fdacbc2 100644 --- a/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker.ts +++ b/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker.ts @@ -43,7 +43,6 @@ const IMAGE_STREAM = 'image_in'; const NORM_RECT_STREAM = 'norm_rect'; const NORM_LANDMARKS_STREAM = 'normalized_landmarks'; const WORLD_LANDMARKS_STREAM = 'world_landmarks'; -const AUXILIARY_LANDMARKS_STREAM = 'auxiliary_landmarks'; const SEGMENTATION_MASK_STREAM = 'segmentation_masks'; const POSE_LANDMARKER_GRAPH = 'mediapipe.tasks.vision.pose_landmarker.PoseLandmarkerGraph'; @@ -64,7 +63,7 @@ export type PoseLandmarkerCallback = (result: PoseLandmarkerResult) => void; export class PoseLandmarker extends VisionTaskRunner { private result: Partial = {}; private outputSegmentationMasks = false; - private userCallback: PoseLandmarkerCallback = () => {}; + private userCallback?: PoseLandmarkerCallback; private readonly options: PoseLandmarkerGraphOptions; private readonly poseLandmarksDetectorGraphOptions: PoseLandmarksDetectorGraphOptions; @@ -200,21 +199,22 @@ export class PoseLandmarker extends VisionTaskRunner { } /** - * Performs pose detection on the provided single image and waits - * synchronously for the response. Only use this method when the - * PoseLandmarker is created with running mode `image`. + * Performs pose detection on the provided single image and invokes the + * callback with the response. The method returns synchronously once the + * callback returns. Only use this method when the PoseLandmarker is created + * with running mode `image`. * * @param image An image to process. * @param callback The callback that is invoked with the result. The * lifetime of the returned masks is only guaranteed for the duration of * the callback. - * @return The detected pose landmarks. */ detect(image: ImageSource, callback: PoseLandmarkerCallback): void; /** - * Performs pose detection on the provided single image and waits - * synchronously for the response. Only use this method when the - * PoseLandmarker is created with running mode `image`. + * Performs pose detection on the provided single image and invokes the + * callback with the response. The method returns synchronously once the + * callback returns. Only use this method when the PoseLandmarker is created + * with running mode `image`. * * @param image An image to process. * @param imageProcessingOptions the `ImageProcessingOptions` specifying how @@ -222,16 +222,42 @@ export class PoseLandmarker extends VisionTaskRunner { * @param callback The callback that is invoked with the result. The * lifetime of the returned masks is only guaranteed for the duration of * the callback. - * @return The detected pose landmarks. */ detect( image: ImageSource, imageProcessingOptions: ImageProcessingOptions, callback: PoseLandmarkerCallback): void; + /** + * Performs pose detection on the provided single image and waits + * synchronously for the response. This method creates a copy of the resulting + * masks and should not be used in high-throughput applictions. Only + * use this method when the PoseLandmarker is created with running mode + * `image`. + * + * @param image An image to process. + * @return The landmarker result. Any masks are copied to avoid lifetime + * limits. + * @return The detected pose landmarks. + */ + detect(image: ImageSource): PoseLandmarkerResult; + /** + * Performs pose detection on the provided single image and waits + * synchronously for the response. This method creates a copy of the resulting + * masks and should not be used in high-throughput applictions. Only + * use this method when the PoseLandmarker is created with running mode + * `image`. + * + * @param image An image to process. + * @return The landmarker result. Any masks are copied to avoid lifetime + * limits. + * @return The detected pose landmarks. + */ + detect(image: ImageSource, imageProcessingOptions: ImageProcessingOptions): + PoseLandmarkerResult; detect( image: ImageSource, - imageProcessingOptionsOrCallback: ImageProcessingOptions| + imageProcessingOptionsOrCallback?: ImageProcessingOptions| PoseLandmarkerCallback, - callback?: PoseLandmarkerCallback): void { + callback?: PoseLandmarkerCallback): PoseLandmarkerResult|void { const imageProcessingOptions = typeof imageProcessingOptionsOrCallback !== 'function' ? imageProcessingOptionsOrCallback : @@ -242,59 +268,94 @@ export class PoseLandmarker extends VisionTaskRunner { this.resetResults(); this.processImageData(image, imageProcessingOptions); - this.userCallback = () => {}; + + if (!this.userCallback) { + return this.result as PoseLandmarkerResult; + } } /** - * Performs pose detection on the provided video frame and waits - * synchronously for the response. Only use this method when the - * PoseLandmarker is created with running mode `video`. + * Performs pose detection on the provided video frame and invokes the + * callback with the response. The method returns synchronously once the + * callback returns. Only use this method when the PoseLandmarker is created + * with running mode `video`. * * @param videoFrame A video frame to process. * @param timestamp The timestamp of the current frame, in ms. * @param callback The callback that is invoked with the result. The * lifetime of the returned masks is only guaranteed for the duration of * the callback. - * @return The detected pose landmarks. */ detectForVideo( videoFrame: ImageSource, timestamp: number, callback: PoseLandmarkerCallback): void; /** - * Performs pose detection on the provided video frame and waits - * synchronously for the response. Only use this method when the - * PoseLandmarker is created with running mode `video`. + * Performs pose detection on the provided video frame and invokes the + * callback with the response. The method returns synchronously once the + * callback returns. Only use this method when the PoseLandmarker is created + * with running mode `video`. * * @param videoFrame A video frame to process. + * @param timestamp The timestamp of the current frame, in ms. * @param imageProcessingOptions the `ImageProcessingOptions` specifying how * to process the input image before running inference. - * @param timestamp The timestamp of the current frame, in ms. * @param callback The callback that is invoked with the result. The * lifetime of the returned masks is only guaranteed for the duration of * the callback. - * @return The detected pose landmarks. */ detectForVideo( - videoFrame: ImageSource, imageProcessingOptions: ImageProcessingOptions, - timestamp: number, callback: PoseLandmarkerCallback): void; + videoFrame: ImageSource, timestamp: number, + imageProcessingOptions: ImageProcessingOptions, + callback: PoseLandmarkerCallback): void; + /** + * Performs pose detection on the provided video frame and returns the result. + * This method creates a copy of the resulting masks and should not be used + * in high-throughput applictions. Only use this method when the + * PoseLandmarker is created with running mode `video`. + * + * @param videoFrame A video frame to process. + * @param timestamp The timestamp of the current frame, in ms. + * @return The landmarker result. Any masks are copied to extend the + * lifetime of the returned data. + */ + detectForVideo(videoFrame: ImageSource, timestamp: number): + PoseLandmarkerResult; + /** + * Performs pose detection on the provided video frame and returns the result. + * This method creates a copy of the resulting masks and should not be used + * in high-throughput applictions. The method returns synchronously once the + * callback returns. Only use this method when the PoseLandmarker is created + * with running mode `video`. + * + * @param videoFrame A video frame to process. + * @param timestamp The timestamp of the current frame, in ms. + * @param imageProcessingOptions the `ImageProcessingOptions` specifying how + * to process the input image before running inference. + * @return The landmarker result. Any masks are copied to extend the lifetime + * of the returned data. + */ detectForVideo( - videoFrame: ImageSource, - timestampOrImageProcessingOptions: number|ImageProcessingOptions, - timestampOrCallback: number|PoseLandmarkerCallback, - callback?: PoseLandmarkerCallback): void { + videoFrame: ImageSource, timestamp: number, + imageProcessingOptions: ImageProcessingOptions): PoseLandmarkerResult; + detectForVideo( + videoFrame: ImageSource, timestamp: number, + imageProcessingOptionsOrCallback?: ImageProcessingOptions| + PoseLandmarkerCallback, + callback?: PoseLandmarkerCallback): PoseLandmarkerResult|void { const imageProcessingOptions = - typeof timestampOrImageProcessingOptions !== 'number' ? - timestampOrImageProcessingOptions : + typeof imageProcessingOptionsOrCallback !== 'function' ? + imageProcessingOptionsOrCallback : {}; - const timestamp = typeof timestampOrImageProcessingOptions === 'number' ? - timestampOrImageProcessingOptions : - timestampOrCallback as number; - this.userCallback = typeof timestampOrCallback === 'function' ? - timestampOrCallback : - callback!; + this.userCallback = typeof imageProcessingOptionsOrCallback === 'function' ? + imageProcessingOptionsOrCallback : + callback; + this.resetResults(); this.processVideoData(videoFrame, imageProcessingOptions, timestamp); - this.userCallback = () => {}; + + if (!this.userCallback) { + return this.result as PoseLandmarkerResult; + } } private resetResults(): void { @@ -309,13 +370,13 @@ export class PoseLandmarker extends VisionTaskRunner { if (!('worldLandmarks' in this.result)) { return; } - if (!('landmarks' in this.result)) { - return; - } if (this.outputSegmentationMasks && !('segmentationMasks' in this.result)) { return; } - this.userCallback(this.result as Required); + + if (this.userCallback) { + this.userCallback(this.result as Required); + } } /** Sets the default values for the graph. */ @@ -332,10 +393,11 @@ export class PoseLandmarker extends VisionTaskRunner { * Converts raw data into a landmark, and adds it to our landmarks list. */ private addJsLandmarks(data: Uint8Array[]): void { + this.result.landmarks = []; for (const binaryProto of data) { const poseLandmarksProto = NormalizedLandmarkList.deserializeBinary(binaryProto); - this.result.landmarks = convertToLandmarks(poseLandmarksProto); + this.result.landmarks.push(convertToLandmarks(poseLandmarksProto)); } } @@ -344,24 +406,12 @@ export class PoseLandmarker extends VisionTaskRunner { * worldLandmarks list. */ private adddJsWorldLandmarks(data: Uint8Array[]): void { + this.result.worldLandmarks = []; for (const binaryProto of data) { const poseWorldLandmarksProto = LandmarkList.deserializeBinary(binaryProto); - this.result.worldLandmarks = - convertToWorldLandmarks(poseWorldLandmarksProto); - } - } - - /** - * Converts raw data into a landmark, and adds it to our auxilary - * landmarks list. - */ - private addJsAuxiliaryLandmarks(data: Uint8Array[]): void { - for (const binaryProto of data) { - const auxiliaryLandmarksProto = - NormalizedLandmarkList.deserializeBinary(binaryProto); - this.result.auxilaryLandmarks = - convertToLandmarks(auxiliaryLandmarksProto); + this.result.worldLandmarks.push( + convertToWorldLandmarks(poseWorldLandmarksProto)); } } @@ -372,7 +422,6 @@ export class PoseLandmarker extends VisionTaskRunner { graphConfig.addInputStream(NORM_RECT_STREAM); graphConfig.addOutputStream(NORM_LANDMARKS_STREAM); graphConfig.addOutputStream(WORLD_LANDMARKS_STREAM); - graphConfig.addOutputStream(AUXILIARY_LANDMARKS_STREAM); graphConfig.addOutputStream(SEGMENTATION_MASK_STREAM); const calculatorOptions = new CalculatorOptions(); @@ -385,8 +434,6 @@ export class PoseLandmarker extends VisionTaskRunner { landmarkerNode.addInputStream('NORM_RECT:' + NORM_RECT_STREAM); landmarkerNode.addOutputStream('NORM_LANDMARKS:' + NORM_LANDMARKS_STREAM); landmarkerNode.addOutputStream('WORLD_LANDMARKS:' + WORLD_LANDMARKS_STREAM); - landmarkerNode.addOutputStream( - 'AUXILIARY_LANDMARKS:' + AUXILIARY_LANDMARKS_STREAM); landmarkerNode.setOptions(calculatorOptions); graphConfig.addNode(landmarkerNode); @@ -417,26 +464,14 @@ export class PoseLandmarker extends VisionTaskRunner { this.maybeInvokeCallback(); }); - this.graphRunner.attachProtoVectorListener( - AUXILIARY_LANDMARKS_STREAM, (binaryProto, timestamp) => { - this.addJsAuxiliaryLandmarks(binaryProto); - this.setLatestOutputTimestamp(timestamp); - this.maybeInvokeCallback(); - }); - this.graphRunner.attachEmptyPacketListener( - AUXILIARY_LANDMARKS_STREAM, timestamp => { - this.result.auxilaryLandmarks = []; - this.setLatestOutputTimestamp(timestamp); - this.maybeInvokeCallback(); - }); - if (this.outputSegmentationMasks) { landmarkerNode.addOutputStream( 'SEGMENTATION_MASK:' + SEGMENTATION_MASK_STREAM); this.graphRunner.attachImageVectorListener( SEGMENTATION_MASK_STREAM, (masks, timestamp) => { - this.result.segmentationMasks = - masks.map(wasmImage => this.convertToMPImage(wasmImage)); + this.result.segmentationMasks = masks.map( + wasmImage => this.convertToMPMask( + wasmImage, /* shouldCopyData= */ !this.userCallback)); this.setLatestOutputTimestamp(timestamp); this.maybeInvokeCallback(); }); diff --git a/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker_result.d.ts b/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker_result.d.ts index 66d0498a6..96e698a85 100644 --- a/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker_result.d.ts +++ b/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker_result.d.ts @@ -16,7 +16,7 @@ import {Category} from '../../../../tasks/web/components/containers/category'; import {Landmark, NormalizedLandmark} from '../../../../tasks/web/components/containers/landmark'; -import {MPImage} from '../../../../tasks/web/vision/core/image'; +import {MPMask} from '../../../../tasks/web/vision/core/mask'; export {Category, Landmark, NormalizedLandmark}; @@ -26,14 +26,11 @@ export {Category, Landmark, NormalizedLandmark}; */ export declare interface PoseLandmarkerResult { /** Pose landmarks of detected poses. */ - landmarks: NormalizedLandmark[]; + landmarks: NormalizedLandmark[][]; /** Pose landmarks in world coordinates of detected poses. */ - worldLandmarks: Landmark[]; - - /** Detected auxiliary landmarks, used for deriving ROI for next frame. */ - auxilaryLandmarks: NormalizedLandmark[]; + worldLandmarks: Landmark[][]; /** Segmentation mask for the detected pose. */ - segmentationMasks?: MPImage[]; + segmentationMasks?: MPMask[]; } diff --git a/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker_test.ts b/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker_test.ts index 794df68b8..d4a49db97 100644 --- a/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker_test.ts +++ b/mediapipe/tasks/web/vision/pose_landmarker/pose_landmarker_test.ts @@ -18,7 +18,7 @@ import 'jasmine'; import {CalculatorGraphConfig} from '../../../../framework/calculator_pb'; import {createLandmarks, createWorldLandmarks} from '../../../../tasks/web/components/processors/landmark_result_test_lib'; import {addJasmineCustomFloatEqualityTester, createSpyWasmModule, MediapipeTasksFake, SpyWasmModule, verifyGraph} from '../../../../tasks/web/core/task_runner_test_utils'; -import {MPImage} from '../../../../tasks/web/vision/core/image'; +import {MPMask} from '../../../../tasks/web/vision/core/mask'; import {VisionGraphRunner} from '../../../../tasks/web/vision/core/vision_task_runner'; import {PoseLandmarker} from './pose_landmarker'; @@ -45,8 +45,7 @@ class PoseLandmarkerFake extends PoseLandmarker implements MediapipeTasksFake { this.attachListenerSpies[0] = spyOn(this.graphRunner, 'attachProtoVectorListener') .and.callFake((stream, listener) => { - expect(stream).toMatch( - /(normalized_landmarks|world_landmarks|auxiliary_landmarks)/); + expect(stream).toMatch(/(normalized_landmarks|world_landmarks)/); this.listeners.set(stream, listener as PacketListener); }); this.attachListenerSpies[1] = @@ -80,23 +79,23 @@ describe('PoseLandmarker', () => { it('initializes graph', async () => { verifyGraph(poseLandmarker); - expect(poseLandmarker.listeners).toHaveSize(3); + expect(poseLandmarker.listeners).toHaveSize(2); }); it('reloads graph when settings are changed', async () => { await poseLandmarker.setOptions({numPoses: 1}); verifyGraph(poseLandmarker, [['poseDetectorGraphOptions', 'numPoses'], 1]); - expect(poseLandmarker.listeners).toHaveSize(3); + expect(poseLandmarker.listeners).toHaveSize(2); await poseLandmarker.setOptions({numPoses: 5}); verifyGraph(poseLandmarker, [['poseDetectorGraphOptions', 'numPoses'], 5]); - expect(poseLandmarker.listeners).toHaveSize(3); + expect(poseLandmarker.listeners).toHaveSize(2); }); it('registers listener for segmentation masks', async () => { - expect(poseLandmarker.listeners).toHaveSize(3); + expect(poseLandmarker.listeners).toHaveSize(2); await poseLandmarker.setOptions({outputSegmentationMasks: true}); - expect(poseLandmarker.listeners).toHaveSize(4); + expect(poseLandmarker.listeners).toHaveSize(3); }); it('merges options', async () => { @@ -209,8 +208,6 @@ describe('PoseLandmarker', () => { (landmarksProto, 1337); poseLandmarker.listeners.get('world_landmarks')! (worldLandmarksProto, 1337); - poseLandmarker.listeners.get('auxiliary_landmarks')! - (landmarksProto, 1337); poseLandmarker.listeners.get('segmentation_masks')!(masks, 1337); }); @@ -222,10 +219,9 @@ describe('PoseLandmarker', () => { .toHaveBeenCalledTimes(1); expect(poseLandmarker.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); - expect(result.landmarks).toEqual([{'x': 0, 'y': 0, 'z': 0}]); - expect(result.worldLandmarks).toEqual([{'x': 0, 'y': 0, 'z': 0}]); - expect(result.auxilaryLandmarks).toEqual([{'x': 0, 'y': 0, 'z': 0}]); - expect(result.segmentationMasks![0]).toBeInstanceOf(MPImage); + expect(result.landmarks).toEqual([[{'x': 0, 'y': 0, 'z': 0}]]); + expect(result.worldLandmarks).toEqual([[{'x': 0, 'y': 0, 'z': 0}]]); + expect(result.segmentationMasks![0]).toBeInstanceOf(MPMask); done(); }); }); @@ -240,8 +236,6 @@ describe('PoseLandmarker', () => { (landmarksProto, 1337); poseLandmarker.listeners.get('world_landmarks')! (worldLandmarksProto, 1337); - poseLandmarker.listeners.get('auxiliary_landmarks')! - (landmarksProto, 1337); }); // Invoke the pose landmarker twice @@ -261,7 +255,39 @@ describe('PoseLandmarker', () => { expect(landmarks1).toEqual(landmarks2); }); - it('invokes listener once masks are avaiblae', (done) => { + it('supports multiple poses', (done) => { + const landmarksProto = [ + createLandmarks(0.1, 0.2, 0.3).serializeBinary(), + createLandmarks(0.4, 0.5, 0.6).serializeBinary() + ]; + const worldLandmarksProto = [ + createWorldLandmarks(1, 2, 3).serializeBinary(), + createWorldLandmarks(4, 5, 6).serializeBinary() + ]; + + poseLandmarker.setOptions({numPoses: 1}); + + // Pass the test data to our listener + poseLandmarker.fakeWasmModule._waitUntilIdle.and.callFake(() => { + poseLandmarker.listeners.get('normalized_landmarks')! + (landmarksProto, 1337); + poseLandmarker.listeners.get('world_landmarks')! + (worldLandmarksProto, 1337); + }); + + // Invoke the pose landmarker + poseLandmarker.detect({} as HTMLImageElement, result => { + expect(result.landmarks).toEqual([ + [{'x': 0.1, 'y': 0.2, 'z': 0.3}], [{'x': 0.4, 'y': 0.5, 'z': 0.6}] + ]); + expect(result.worldLandmarks).toEqual([ + [{'x': 1, 'y': 2, 'z': 3}], [{'x': 4, 'y': 5, 'z': 6}] + ]); + done(); + }); + }); + + it('invokes listener once masks are available', (done) => { const landmarksProto = [createLandmarks().serializeBinary()]; const worldLandmarksProto = [createWorldLandmarks().serializeBinary()]; const masks = [ @@ -281,8 +307,6 @@ describe('PoseLandmarker', () => { poseLandmarker.listeners.get('world_landmarks')! (worldLandmarksProto, 1337); expect(listenerCalled).toBeFalse(); - poseLandmarker.listeners.get('auxiliary_landmarks')! - (landmarksProto, 1337); expect(listenerCalled).toBeFalse(); poseLandmarker.listeners.get('segmentation_masks')!(masks, 1337); expect(listenerCalled).toBeTrue(); @@ -294,4 +318,23 @@ describe('PoseLandmarker', () => { listenerCalled = true; }); }); + + it('returns result', () => { + const landmarksProto = [createLandmarks().serializeBinary()]; + const worldLandmarksProto = [createWorldLandmarks().serializeBinary()]; + + // Pass the test data to our listener + poseLandmarker.fakeWasmModule._waitUntilIdle.and.callFake(() => { + poseLandmarker.listeners.get('normalized_landmarks')! + (landmarksProto, 1337); + poseLandmarker.listeners.get('world_landmarks')! + (worldLandmarksProto, 1337); + }); + + // Invoke the pose landmarker + const result = poseLandmarker.detect({} as HTMLImageElement); + expect(poseLandmarker.fakeWasmModule._waitUntilIdle).toHaveBeenCalled(); + expect(result.landmarks).toEqual([[{'x': 0, 'y': 0, 'z': 0}]]); + expect(result.worldLandmarks).toEqual([[{'x': 0, 'y': 0, 'z': 0}]]); + }); }); diff --git a/mediapipe/tasks/web/vision/types.ts b/mediapipe/tasks/web/vision/types.ts index 164276bab..760b97b77 100644 --- a/mediapipe/tasks/web/vision/types.ts +++ b/mediapipe/tasks/web/vision/types.ts @@ -16,7 +16,8 @@ export * from '../../../tasks/web/core/fileset_resolver'; export * from '../../../tasks/web/vision/core/drawing_utils'; -export {MPImage, MPImageChannelConverter, MPImageType} from '../../../tasks/web/vision/core/image'; +export {MPImage} from '../../../tasks/web/vision/core/image'; +export {MPMask} from '../../../tasks/web/vision/core/mask'; export * from '../../../tasks/web/vision/face_detector/face_detector'; export * from '../../../tasks/web/vision/face_landmarker/face_landmarker'; export * from '../../../tasks/web/vision/face_stylizer/face_stylizer'; diff --git a/mediapipe/web/graph_runner/graph_runner_image_lib.ts b/mediapipe/web/graph_runner/graph_runner_image_lib.ts index 8b491d891..d2d6e52a8 100644 --- a/mediapipe/web/graph_runner/graph_runner_image_lib.ts +++ b/mediapipe/web/graph_runner/graph_runner_image_lib.ts @@ -10,7 +10,7 @@ type LibConstructor = new (...args: any[]) => GraphRunner; /** An image returned from a MediaPipe graph. */ export interface WasmImage { - data: Uint8ClampedArray|Float32Array|WebGLTexture; + data: Uint8Array|Float32Array|WebGLTexture; width: number; height: number; } diff --git a/third_party/BUILD b/third_party/BUILD index 7522bab1b..60fa73799 100644 --- a/third_party/BUILD +++ b/third_party/BUILD @@ -13,6 +13,9 @@ # limitations under the License. # +load("@rules_foreign_cc//tools/build_defs:cmake.bzl", "cmake_external") +load("@bazel_skylib//:bzl_library.bzl", "bzl_library") + licenses(["notice"]) # Apache License 2.0 exports_files(["LICENSE"]) @@ -61,16 +64,73 @@ config_setting( visibility = ["//visibility:public"], ) +config_setting( + name = "opencv_ios_arm64_source_build", + define_values = { + "OPENCV": "source", + }, + values = { + "apple_platform_type": "ios", + "cpu": "ios_arm64", + }, +) + +config_setting( + name = "opencv_ios_sim_arm64_source_build", + define_values = { + "OPENCV": "source", + }, + values = { + "apple_platform_type": "ios", + "cpu": "ios_sim_arm64", + }, +) + +config_setting( + name = "opencv_ios_x86_64_source_build", + define_values = { + "OPENCV": "source", + }, + values = { + "apple_platform_type": "ios", + "cpu": "ios_x86_64", + }, +) + +config_setting( + name = "opencv_ios_sim_fat_source_build", + define_values = { + "OPENCV": "source", + }, + values = { + "apple_platform_type": "ios", + "ios_multi_cpus": "sim_arm64, x86_64", + }, +) + alias( name = "opencv", actual = select({ ":opencv_source_build": ":opencv_cmake", + ":opencv_ios_sim_arm64_source_build": "@ios_opencv_source//:opencv", + ":opencv_ios_sim_fat_source_build": "@ios_opencv_source//:opencv", + ":opencv_ios_arm64_source_build": "@ios_opencv_source//:opencv", "//conditions:default": ":opencv_binary", }), visibility = ["//visibility:public"], ) -load("@rules_foreign_cc//tools/build_defs:cmake.bzl", "cmake_external") +bzl_library( + name = "opencv_ios_xcframework_files_bzl", + srcs = ["opencv_ios_xcframework_files.bzl"], + visibility = ["//visibility:private"], +) + +bzl_library( + name = "opencv_ios_source_bzl", + srcs = ["opencv_ios_source.bzl"], + visibility = ["//visibility:private"], +) # Note: this determines the order in which the libraries are passed to the # linker, so if library A depends on library B, library B must come _after_. diff --git a/third_party/external_files.bzl b/third_party/external_files.bzl index af9361bb3..652a2947f 100644 --- a/third_party/external_files.bzl +++ b/third_party/external_files.bzl @@ -204,8 +204,8 @@ def external_files(): http_file( name = "com_google_mediapipe_conv2d_input_channel_1_tflite", - sha256 = "126edac445967799f3b8b124d15483b1506f6d6cb57a501c1636eb8f2fb3734f", - urls = ["https://storage.googleapis.com/mediapipe-assets/conv2d_input_channel_1.tflite?generation=1678218348519744"], + sha256 = "ccb667092f3aed3a35a57fb3478fecc0c8f6360dbf477a9db9c24e5b3ec4273e", + urls = ["https://storage.googleapis.com/mediapipe-assets/conv2d_input_channel_1.tflite?generation=1683252905577703"], ) http_file( @@ -246,8 +246,8 @@ def external_files(): http_file( name = "com_google_mediapipe_dense_tflite", - sha256 = "be9323068461b1cbf412692ee916be30dcb1a5fb59a9ee875d470bc340d9e869", - urls = ["https://storage.googleapis.com/mediapipe-assets/dense.tflite?generation=1678218351373709"], + sha256 = "6795e7c3a263f44e97be048a5e1166e0921b453bfbaf037f4f69ac5c059ee945", + urls = ["https://storage.googleapis.com/mediapipe-assets/dense.tflite?generation=1683252907920466"], ) http_file( @@ -960,8 +960,8 @@ def external_files(): http_file( name = "com_google_mediapipe_portrait_selfie_segmentation_expected_category_mask_jpg", - sha256 = "d8f20fa746e14067f668dd293f21bbc50ec81196d186386a6ded1278c3ec8f46", - urls = ["https://storage.googleapis.com/mediapipe-assets/portrait_selfie_segmentation_expected_category_mask.jpg?generation=1678606935088873"], + sha256 = "1400c6fccf3805bfd1644d7ed9be98dfa4f900e1720838c566963f8d9f10f5d0", + urls = ["https://storage.googleapis.com/mediapipe-assets/portrait_selfie_segmentation_expected_category_mask.jpg?generation=1683332555306471"], ) http_file( @@ -972,8 +972,8 @@ def external_files(): http_file( name = "com_google_mediapipe_portrait_selfie_segmentation_landscape_expected_category_mask_jpg", - sha256 = "f5c3fa3d93f8e7289b69b8a89c2519276dfa5014dcc50ed6e86e8cd4d4ae7f27", - urls = ["https://storage.googleapis.com/mediapipe-assets/portrait_selfie_segmentation_landscape_expected_category_mask.jpg?generation=1678606939469429"], + sha256 = "a208aeeeb615fd40046d883e2c7982458e1b12edd6526e88c305c4053b0a9399", + urls = ["https://storage.googleapis.com/mediapipe-assets/portrait_selfie_segmentation_landscape_expected_category_mask.jpg?generation=1683332557473435"], ) http_file( @@ -1158,14 +1158,14 @@ def external_files(): http_file( name = "com_google_mediapipe_selfie_segmentation_landscape_tflite", - sha256 = "28fb4c287d6295a2dba6c1f43b43315a37f927ddcd6693d635d625d176eef162", - urls = ["https://storage.googleapis.com/mediapipe-assets/selfie_segmentation_landscape.tflite?generation=1678775102234495"], + sha256 = "a77d03f4659b9f6b6c1f5106947bf40e99d7655094b6527f214ea7d451106edd", + urls = ["https://storage.googleapis.com/mediapipe-assets/selfie_segmentation_landscape.tflite?generation=1683332561312022"], ) http_file( name = "com_google_mediapipe_selfie_segmentation_tflite", - sha256 = "b0e2ec6f95107795b952b27f3d92806b45f0bc069dac76dcd264cd1b90d61c6c", - urls = ["https://storage.googleapis.com/mediapipe-assets/selfie_segmentation.tflite?generation=1678775104900954"], + sha256 = "9ee168ec7c8f2a16c56fe8e1cfbc514974cbbb7e434051b455635f1bd1462f5c", + urls = ["https://storage.googleapis.com/mediapipe-assets/selfie_segmentation.tflite?generation=1683332563830600"], ) http_file( diff --git a/third_party/opencv_ios_source.BUILD b/third_party/opencv_ios_source.BUILD new file mode 100644 index 000000000..c0cb65908 --- /dev/null +++ b/third_party/opencv_ios_source.BUILD @@ -0,0 +1,125 @@ +# Description: +# OpenCV xcframework for video/image processing on iOS. + +licenses(["notice"]) # BSD license + +exports_files(["LICENSE"]) + +load( + "@build_bazel_rules_apple//apple:apple.bzl", + "apple_static_xcframework_import", +) +load( + "@//third_party:opencv_ios_source.bzl", + "select_headers", + "unzip_opencv_xcframework", +) + +# Build opencv2.xcframework from source using a convenience script provided in +# OPENCV sources and zip the xcframework. We only build the modules required by MediaPipe by specifying +# the modules to be ignored as command line arguments. +# We also specify the simulator and device architectures we are building for. +# Currently we only support iOS arm64 (M1 Macs) and x86_64(Intel Macs) simulators +# and arm64 iOS devices. +# Bitcode and Swift support. Swift support will be added in when the final binary +# for MediaPipe iOS Task libraries are built. Shipping with OPENCV built with +# Swift support throws linker errors when the MediaPipe framework is used from +# an iOS project. +genrule( + name = "build_opencv_xcframework", + srcs = glob(["opencv-4.5.1/**"]), + outs = ["opencv2.xcframework.zip"], + cmd = "&&".join([ + "$(location opencv-4.5.1/platforms/apple/build_xcframework.py) \ + --iphonesimulator_archs arm64,x86_64 \ + --iphoneos_archs arm64 \ + --without dnn \ + --without ml \ + --without stitching \ + --without photo \ + --without objdetect \ + --without gapi \ + --without flann \ + --disable PROTOBUF \ + --disable-bitcode \ + --disable-swift \ + --build_only_specified_archs \ + --out $(@D)", + "cd $(@D)", + "zip --symlinks -r opencv2.xcframework.zip opencv2.xcframework", + ]), +) + +# Unzips `opencv2.xcframework.zip` built from source by `build_opencv_xcframework` +# genrule and returns an exhaustive list of all its files including symlinks. +unzip_opencv_xcframework( + name = "opencv2_unzipped_xcframework_files", + zip_file = "opencv2.xcframework.zip", +) + +# Imports the files of the unzipped `opencv2.xcframework` as an apple static +# framework which can be linked to iOS targets. +apple_static_xcframework_import( + name = "opencv_xcframework", + visibility = ["//visibility:public"], + xcframework_imports = [":opencv2_unzipped_xcframework_files"], +) + +# Filters the headers for each platform in `opencv2.xcframework` which will be +# used as headers in a `cc_library` that can be linked to C++ targets. +select_headers( + name = "opencv_xcframework_device_headers", + srcs = [":opencv_xcframework"], + platform = "ios-arm64", +) + +select_headers( + name = "opencv_xcframework_simulator_headers", + srcs = [":opencv_xcframework"], + platform = "ios-arm64_x86_64-simulator", +) + +# `cc_library` that can be linked to C++ targets to import opencv headers. +cc_library( + name = "opencv", + hdrs = select({ + "@//mediapipe:ios_x86_64": [ + ":opencv_xcframework_simulator_headers", + ], + "@//mediapipe:ios_sim_arm64": [ + ":opencv_xcframework_simulator_headers", + ], + "@//mediapipe:ios_arm64": [ + ":opencv_xcframework_simulator_headers", + ], + # A value from above is chosen arbitarily. + "//conditions:default": [ + ":opencv_xcframework_simulator_headers", + ], + }), + copts = [ + "-std=c++11", + "-x objective-c++", + ], + include_prefix = "opencv2", + linkopts = [ + "-framework AssetsLibrary", + "-framework CoreFoundation", + "-framework CoreGraphics", + "-framework CoreMedia", + "-framework Accelerate", + "-framework CoreImage", + "-framework AVFoundation", + "-framework CoreVideo", + "-framework QuartzCore", + ], + strip_include_prefix = select({ + "@//mediapipe:ios_x86_64": "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers", + "@//mediapipe:ios_sim_arm64": "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers", + "@//mediapipe:ios_arm64": "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers", + # Random value is selected for default cases. + "//conditions:default": "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers", + }), + visibility = ["//visibility:public"], + deps = [":opencv_xcframework"], +) diff --git a/third_party/opencv_ios_source.bzl b/third_party/opencv_ios_source.bzl new file mode 100644 index 000000000..e46fb4cac --- /dev/null +++ b/third_party/opencv_ios_source.bzl @@ -0,0 +1,158 @@ +# Copyright 2023 The MediaPipe Authors. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Custom rules for building iOS OpenCV xcframework from sources.""" + +load( + "@//third_party:opencv_ios_xcframework_files.bzl", + "OPENCV_XCFRAMEWORK_INFO_PLIST_PATH", + "OPENCV_XCFRAMEWORK_IOS_DEVICE_FILE_PATHS", + "OPENCV_XCFRAMEWORK_IOS_SIMULATOR_FILE_PATHS", +) + +_OPENCV_XCFRAMEWORK_DIR_NAME = "opencv2.xcframework" +_OPENCV_FRAMEWORK_DIR_NAME = "opencv2.framework" +_OPENCV_SIMULATOR_PLATFORM_DIR_NAME = "ios-arm64_x86_64-simulator" +_OPENCV_DEVICE_PLATFORM_DIR_NAME = "ios-arm64" + +def _select_headers_impl(ctx): + _files = [ + f + for f in ctx.files.srcs + if (f.basename.endswith(".h") or f.basename.endswith(".hpp")) and + f.dirname.find(ctx.attr.platform) != -1 + ] + return [DefaultInfo(files = depset(_files))] + +# This rule selects only the headers from an apple static xcframework filtered by +# an input platform string. +select_headers = rule( + implementation = _select_headers_impl, + attrs = { + "srcs": attr.label_list(mandatory = True, allow_files = True), + "platform": attr.string(mandatory = True), + }, +) + +# This function declares and returns symlinks to the directories within each platform +# in `opencv2.xcframework` expected to be present. +# The symlinks are created according to the structure stipulated by apple xcframeworks +# do that they can be correctly consumed by `apple_static_xcframework_import` rule. +def _opencv2_directory_symlinks(ctx, platforms): + basenames = ["Resources", "Headers", "Modules", "Versions/Current"] + symlinks = [] + + for platform in platforms: + symlinks = symlinks + [ + ctx.actions.declare_symlink( + _OPENCV_XCFRAMEWORK_DIR_NAME + "/{}/{}/{}".format(platform, _OPENCV_FRAMEWORK_DIR_NAME, name), + ) + for name in basenames + ] + + return symlinks + +# This function declares and returns all the files for each platform expected +# to be present in `opencv2.xcframework` after the unzipping action is run. +def _opencv2_file_list(ctx, platform_filepath_lists): + binary_name = "opencv2" + output_files = [] + binaries_to_symlink = [] + + for (platform, filepaths) in platform_filepath_lists: + for path in filepaths: + file = ctx.actions.declare_file(path) + output_files.append(file) + if path.endswith(binary_name): + symlink_output = ctx.actions.declare_file( + _OPENCV_XCFRAMEWORK_DIR_NAME + "/{}/{}/{}".format( + platform, + _OPENCV_FRAMEWORK_DIR_NAME, + binary_name, + ), + ) + binaries_to_symlink.append((symlink_output, file)) + + return output_files, binaries_to_symlink + +def _unzip_opencv_xcframework_impl(ctx): + # Array to iterate over the various platforms to declare output files and + # symlinks. + platform_filepath_lists = [ + (_OPENCV_SIMULATOR_PLATFORM_DIR_NAME, OPENCV_XCFRAMEWORK_IOS_SIMULATOR_FILE_PATHS), + (_OPENCV_DEVICE_PLATFORM_DIR_NAME, OPENCV_XCFRAMEWORK_IOS_DEVICE_FILE_PATHS), + ] + + # Gets an exhaustive list of output files which are present in the xcframework. + # Also gets array of `(binary simlink, binary)` pairs which are to be symlinked + # using `ctx.actions.symlink()`. + output_files, binaries_to_symlink = _opencv2_file_list(ctx, platform_filepath_lists) + output_files.append(ctx.actions.declare_file(OPENCV_XCFRAMEWORK_INFO_PLIST_PATH)) + + # xcframeworks have a directory structure in which the `opencv2.framework` folders for each + # platform contain directories which are symlinked to the respective folders of the version + # in use. Simply unzipping the zip of the framework will not make Bazel treat these + # as symlinks. They have to be explicity declared as symlinks using `ctx.actions.declare_symlink()`. + directory_symlinks = _opencv2_directory_symlinks( + ctx, + [_OPENCV_SIMULATOR_PLATFORM_DIR_NAME, _OPENCV_DEVICE_PLATFORM_DIR_NAME], + ) + + output_files = output_files + directory_symlinks + + args = ctx.actions.args() + + # Add the path of the zip file to be unzipped as an argument to be passed to + # `run_shell` action. + args.add(ctx.file.zip_file.path) + + # Add the path to the directory in which the framework is to be unzipped to. + args.add(ctx.file.zip_file.dirname) + + ctx.actions.run_shell( + inputs = [ctx.file.zip_file], + outputs = output_files, + arguments = [args], + progress_message = "Unzipping %s" % ctx.file.zip_file.short_path, + command = "unzip -qq $1 -d $2", + ) + + # The symlinks of the opencv2 binaries for each platform in the xcframework + # have to be symlinked using the `ctx.actions.symlink` unlike the directory + # symlinks which can be expected to be valid when unzipping is completed. + # Otherwise, when tests are run, the linker complaints that the binary is + # not found. + binary_symlink_files = [] + for (symlink_output, binary_file) in binaries_to_symlink: + ctx.actions.symlink(output = symlink_output, target_file = binary_file) + binary_symlink_files.append(symlink_output) + + # Return all the declared output files and symlinks as the output of this + # rule. + return [DefaultInfo(files = depset(output_files + binary_symlink_files))] + +# This rule unzips an `opencv2.xcframework.zip` created by a genrule that +# invokes a python script in the opencv 4.5.1 github archive. +# It returns all the contents of opencv2.xcframework as a list of files in the +# output. This rule works by explicitly declaring files at hardcoded +# paths in the opencv2 xcframework bundle which are expected to be present when +# the zip file is unzipped. This is a prerequisite since the outputs of this rule +# will be consumed by apple_static_xcframework_import which can only take a list +# of files as inputs. +unzip_opencv_xcframework = rule( + implementation = _unzip_opencv_xcframework_impl, + attrs = { + "zip_file": attr.label(mandatory = True, allow_single_file = True), + }, +) diff --git a/third_party/opencv_ios_xcframework_files.bzl b/third_party/opencv_ios_xcframework_files.bzl new file mode 100644 index 000000000..f3ea23883 --- /dev/null +++ b/third_party/opencv_ios_xcframework_files.bzl @@ -0,0 +1,468 @@ +# Copyright 2023 The MediaPipe Authors. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""List of file paths in the `opencv2.xcframework` bundle.""" + +OPENCV_XCFRAMEWORK_INFO_PLIST_PATH = "opencv2.xcframework/Info.plist" + +OPENCV_XCFRAMEWORK_IOS_SIMULATOR_FILE_PATHS = [ + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Resources/Info.plist", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Moments.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfRect2d.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfFloat4.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfPoint2i.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/video/tracking.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/video/legacy/constants_c.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/video/background_segm.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/video/video.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Double3.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfByte.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Range.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Core.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Size2f.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/world.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/opencv2-Swift.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/fast_math.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda_types.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/check.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cv_cpu_dispatch.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utility.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/softfloat.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cv_cpu_helper.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cvstd.inl.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/msa_macros.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_rvv.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/simd_utils.impl.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_wasm.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_neon.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_avx.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_avx512.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_vsx.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/interface.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_msa.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_cpp.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_forward.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_sse.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/intrin_sse_em.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/hal/hal.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/async.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/bufferpool.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/ovx.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/optim.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/va_intel.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cvdef.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/warp.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/filters.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/dynamic_smem.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/reduce.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/utility.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/warp_shuffle.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/border_interpolate.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/transform.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/saturate_cast.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/vec_math.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/functional.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/limits.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/type_traits.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/vec_distance.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/block.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/reduce.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/reduce_key_val.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/color_detail.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/type_traits_detail.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/vec_distance_detail.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/detail/transform_detail.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/emulation.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/color.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/datamov_utils.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/funcattrib.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/common.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/vec_traits.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/simd_functions.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/warp_reduce.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda/scan.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/traits.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opengl.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cvstd_wrapper.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda.inl.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/eigen.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda_stream_accessor.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/ocl.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cuda.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/affine.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/mat.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/logger.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/allocator_stats.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/allocator_stats.impl.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/logtag.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/filesystem.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/tls.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/trace.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/instrumentation.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/utils/logger.defines.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/quaternion.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/neon_utils.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/sse_utils.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/version.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/opencl_info.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_gl.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_definitions.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_hsa_extension.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_clamdblas.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_core.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_20.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_core_wrappers.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_gl_wrappers.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_clamdfft.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_gl.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_clamdblas.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_core.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_core_wrappers.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_gl_wrappers.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_clamdfft.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/ocl_defs.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/opencl/opencl_svm.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/ocl_genbase.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/detail/async_promise.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/detail/exception_ptr.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/simd_intrinsics.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/matx.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/directx.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/base.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/operations.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/vsx_utils.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/persistence.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/mat.inl.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/types_c.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/cvstd.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/types.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/bindings_utils.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/quaternion.inl.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/saturate.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/core_c.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core/core.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Converters.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Mat.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Algorithm.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/opencv.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Mat+Converters.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/ByteVector.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/imgproc.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/imgproc_c.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/hal/interface.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/hal/hal.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/detail/gcgraph.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgproc/types_c.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/highgui.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/features2d.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point2f.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/KeyPoint.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Rect2f.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Float6.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfKeyPoint.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfRect2i.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/FloatVector.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/TermCriteria.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/opencv2.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Int4.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfDMatch.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Scalar.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point3f.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfDouble.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/IntVector.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/RotatedRect.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfFloat6.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/cvconfig.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/DoubleVector.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Size2d.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MinMaxLocResult.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfInt4.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Rect2i.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point2i.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfPoint3.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfRotatedRect.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/DMatch.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/TickMeter.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point3i.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/video.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs/ios.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs/legacy/constants_c.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs/macosx.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs/imgcodecs.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs/imgcodecs_c.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/CvType.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/CVObjcUtil.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Size2i.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/imgcodecs.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Float4.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio/registry.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio/cap_ios.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio/legacy/constants_c.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio/videoio.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio/videoio_c.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfFloat.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Rect2d.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfPoint2f.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point2d.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/highgui/highgui.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/highgui/highgui_c.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Double2.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/CvCamera2.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/features2d/hal/interface.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/features2d/features2d.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/videoio.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/opencv_modules.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/core.hpp", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfInt.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/ArrayUtil.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/MatOfPoint3f.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Headers/Point3d.h", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/x86_64-apple-ios-simulator.swiftinterface", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios-simulator.abi.json", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios-simulator.private.swiftinterface", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/x86_64-apple-ios-simulator.swiftdoc", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/Project/arm64-apple-ios-simulator.swiftsourceinfo", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/Project/x86_64-apple-ios-simulator.swiftsourceinfo", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios-simulator.swiftinterface", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/x86_64-apple-ios-simulator.private.swiftinterface", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios-simulator.swiftdoc", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/x86_64-apple-ios-simulator.abi.json", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/Modules/module.modulemap", + "opencv2.xcframework/ios-arm64_x86_64-simulator/opencv2.framework/Versions/A/opencv2", +] + +OPENCV_XCFRAMEWORK_IOS_DEVICE_FILE_PATHS = [ + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Resources/Info.plist", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Moments.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfRect2d.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfFloat4.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfPoint2i.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/video/tracking.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/video/legacy/constants_c.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/video/background_segm.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/video/video.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Double3.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfByte.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Range.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Core.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Size2f.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/world.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/opencv2-Swift.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/fast_math.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda_types.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/check.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cv_cpu_dispatch.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utility.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/softfloat.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cv_cpu_helper.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cvstd.inl.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/msa_macros.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_rvv.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/simd_utils.impl.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_wasm.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_neon.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_avx.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_avx512.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_vsx.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/interface.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_msa.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_cpp.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_forward.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_sse.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/intrin_sse_em.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/hal/hal.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/async.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/bufferpool.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/ovx.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/optim.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/va_intel.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cvdef.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/warp.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/filters.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/dynamic_smem.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/reduce.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/utility.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/warp_shuffle.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/border_interpolate.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/transform.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/saturate_cast.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/vec_math.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/functional.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/limits.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/type_traits.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/vec_distance.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/block.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/reduce.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/reduce_key_val.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/color_detail.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/type_traits_detail.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/vec_distance_detail.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/detail/transform_detail.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/emulation.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/color.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/datamov_utils.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/funcattrib.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/common.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/vec_traits.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/simd_functions.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/warp_reduce.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda/scan.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/traits.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opengl.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cvstd_wrapper.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda.inl.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/eigen.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda_stream_accessor.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/ocl.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cuda.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/affine.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/mat.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/logger.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/allocator_stats.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/allocator_stats.impl.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/logtag.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/filesystem.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/tls.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/trace.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/instrumentation.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/utils/logger.defines.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/quaternion.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/neon_utils.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/sse_utils.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/version.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/opencl_info.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_gl.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_definitions.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_hsa_extension.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_clamdblas.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_core.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_svm_20.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_core_wrappers.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_gl_wrappers.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/opencl_clamdfft.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_gl.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_clamdblas.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_core.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_core_wrappers.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_gl_wrappers.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/runtime/autogenerated/opencl_clamdfft.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/ocl_defs.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/opencl/opencl_svm.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/ocl_genbase.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/detail/async_promise.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/detail/exception_ptr.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/simd_intrinsics.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/matx.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/directx.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/base.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/operations.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/vsx_utils.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/persistence.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/mat.inl.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/types_c.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/cvstd.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/types.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/bindings_utils.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/quaternion.inl.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/saturate.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/core_c.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core/core.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Converters.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Mat.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Algorithm.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/opencv.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Mat+Converters.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/ByteVector.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/imgproc.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/imgproc_c.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/hal/interface.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/hal/hal.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/detail/gcgraph.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgproc/types_c.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/highgui.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/features2d.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point2f.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/KeyPoint.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Rect2f.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Float6.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfKeyPoint.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfRect2i.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/FloatVector.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/TermCriteria.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/opencv2.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Int4.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfDMatch.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Scalar.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point3f.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfDouble.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/IntVector.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/RotatedRect.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfFloat6.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/cvconfig.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/DoubleVector.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Size2d.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MinMaxLocResult.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfInt4.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Rect2i.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point2i.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfPoint3.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfRotatedRect.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/DMatch.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/TickMeter.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point3i.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/video.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs/ios.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs/legacy/constants_c.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs/macosx.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs/imgcodecs.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs/imgcodecs_c.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/CvType.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/CVObjcUtil.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Size2i.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/imgcodecs.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Float4.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio/registry.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio/cap_ios.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio/legacy/constants_c.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio/videoio.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio/videoio_c.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfFloat.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Rect2d.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfPoint2f.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point2d.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/highgui/highgui.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/highgui/highgui_c.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Double2.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/CvCamera2.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/features2d/hal/interface.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/features2d/features2d.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/videoio.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/opencv_modules.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/core.hpp", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfInt.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/ArrayUtil.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/MatOfPoint3f.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Headers/Point3d.h", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios.swiftinterface", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios.swiftdoc", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/Project/arm64-apple-ios.swiftsourceinfo", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios.abi.json", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/opencv2.swiftmodule/arm64-apple-ios.private.swiftinterface", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/Modules/module.modulemap", + "opencv2.xcframework/ios-arm64/opencv2.framework/Versions/A/opencv2", +]