History

MediaPipe Team 7fb37c80e8 Project import generated by Copybara. GitOrigin-RevId: 19a829ffd755edb43e54d20c0e7b9348512d5108		2022-05-05 19:57:20 +00:00
..
__init__.py	Project import generated by Copybara.	2019-06-16 16:06:57 -07:00
BUILD	Project import generated by Copybara.	2021-03-25 22:09:18 -04:00
charades_dataset.py	Project import generated by Copybara.	2020-03-10 18:14:25 -07:00
demo_dataset.py	Project import generated by Copybara.	2020-05-21 13:37:51 -04:00
kinetics_dataset.py	Project import generated by Copybara.	2020-03-10 18:14:25 -07:00
read_demo_dataset.py	Project import generated by Copybara.	2022-05-05 19:57:20 +00:00
README.md	Project import generated by Copybara.	2019-08-16 18:56:48 -07:00
run_graph_file_io_main.cc	Project import generated by Copybara.	2021-03-25 22:09:18 -04:00

README.md

Preparing data sets for machine learning with MediaPipe

We include two pipelines to prepare data sets for training TensorFlow models.

Using these data sets is split into two parts. First, the data set is constructed in with a Python script and MediaPipe C++ binary. The C++ binary should be compiled by the end user because the preparation for different data sets requires different MediaPipe calculator dependencies. The result of running the script is a data set of TFRecord files on disk. The second stage is reading the data from TensorFlow into a tf.data.Dataset. Both pipelines can be imported and support a simple call to as_dataset() to make the data available.

Demo data set

To generate the demo dataset you must have Tensorflow installed. Then the media_sequence_demo binary must be built from the top directory in the mediapipe repo and the command to build the data set must be run from the same directory.

bazel build -c opt mediapipe/examples/desktop/media_sequence:media_sequence_demo \
  --define MEDIAPIPE_DISABLE_GPU=1

python -m mediapipe.examples.desktop.media_sequence.demo_dataset \
  --alsologtostderr \
  --path_to_demo_data=/tmp/demo_data/ \
  --path_to_mediapipe_binary=bazel-bin/mediapipe/examples/desktop/\
media_sequence/media_sequence_demo  \
  --path_to_graph_directory=mediapipe/graphs/media_sequence/

Charades data set

The Charades data set is ready for training and/or evaluating action recognition models in TensorFlow. You may only use this script in ways that comply with the Allen Institute for Artificial Intelligence's license for the Charades data set.

To generate the Charades dataset you must have Tensorflow installed. Then the media_sequence_demo binary must be built from the top directory in the mediapipe repo and the command to build the data set must be run from the same directory.

bazel build -c opt mediapipe/examples/desktop/media_sequence:media_sequence_demo \
  --define MEDIAPIPE_DISABLE_GPU=1

python -m mediapipe.examples.desktop.media_sequence.charades_dataset \
  --alsologtostderr \
  --path_to_charades_data=/tmp/charades_data/ \
  --path_to_mediapipe_binary=bazel-bin/mediapipe/examples/desktop/\
media_sequence/media_sequence_demo  \
  --path_to_graph_directory=mediapipe/graphs/media_sequence/

Custom videos in the Kinetics format

To produce data in the same format at the Kinetics data, use the kinetics.py script.

To generate the dataset you must have Tensorflow installed. Then the media_sequence_demo binary must be built from the top directory in the mediapipe repo and the command to build the data set must be run from the same directory.

echo "Credit for this video belongs to: ESA/Hubble; Music: Johan B. Monell"
wget https://cdn.spacetelescope.org/archives/videos/medium_podcast/heic1608c.mp4 -O /tmp/heic1608c.mp4
CUSTOM_CSV=/tmp/custom_kinetics.csv
VIDEO_PATH=/tmp/heic1608c.mp4
echo -e "video,time_start,time_end,split\n${VIDEO_PATH},0,10,custom" > ${CUSTOM_CSV}

bazel build -c opt mediapipe/examples/desktop/media_sequence:media_sequence_demo \
  --define MEDIAPIPE_DISABLE_GPU=1

python -m mediapipe.examples.desktop.media_sequence.kinetics_dataset \
  --alsologtostderr \
  --splits_to_process=custom \
  --path_to_custom_csv=${CUSTOM_CSV} \
  --video_path_format_string={video} \
  --path_to_kinetics_data=/tmp/ms/kinetics/ \
  --path_to_mediapipe_binary=bazel-bin/mediapipe/examples/desktop/\
media_sequence/media_sequence_demo  \
  --path_to_graph_directory=mediapipe/graphs/media_sequence/