|
||
---|---|---|
.. | ||
__init__.py | ||
BUILD | ||
charades_dataset.py | ||
demo_dataset.py | ||
kinetics_dataset.py | ||
read_demo_dataset.py | ||
README.md | ||
run_graph_file_io_main.cc |
Preparing data sets for machine learning with MediaPipe
We include two pipelines to prepare data sets for training TensorFlow models.
Using these data sets is split into two parts. First, the data set is constructed in with a Python script and MediaPipe C++ binary. The C++ binary should be compiled by the end user because the preparation for different data sets requires different MediaPipe calculator dependencies. The result of running the script is a data set of TFRecord files on disk. The second stage is reading the data from TensorFlow into a tf.data.Dataset. Both pipelines can be imported and support a simple call to as_dataset() to make the data available.
Demo data set
To generate the demo dataset you must have Tensorflow installed. Then the media_sequence_demo binary must be built from the top directory in the mediapipe repo and the command to build the data set must be run from the same directory.
bazel build -c opt mediapipe/examples/desktop/media_sequence:media_sequence_demo \
--define MEDIAPIPE_DISABLE_GPU=1
python -m mediapipe.examples.desktop.media_sequence.demo_dataset \
--alsologtostderr \
--path_to_demo_data=/tmp/demo_data/ \
--path_to_mediapipe_binary=bazel-bin/mediapipe/examples/desktop/\
media_sequence/media_sequence_demo \
--path_to_graph_directory=mediapipe/graphs/media_sequence/
Charades data set
The Charades data set is ready for training and/or evaluating action recognition models in TensorFlow. You may only use this script in ways that comply with the Allen Institute for Artificial Intelligence's license for the Charades data set.
To generate the Charades dataset you must have Tensorflow installed. Then the media_sequence_demo binary must be built from the top directory in the mediapipe repo and the command to build the data set must be run from the same directory.
bazel build -c opt mediapipe/examples/desktop/media_sequence:media_sequence_demo \
--define MEDIAPIPE_DISABLE_GPU=1
python -m mediapipe.examples.desktop.media_sequence.charades_dataset \
--alsologtostderr \
--path_to_charades_data=/tmp/charades_data/ \
--path_to_mediapipe_binary=bazel-bin/mediapipe/examples/desktop/\
media_sequence/media_sequence_demo \
--path_to_graph_directory=mediapipe/graphs/media_sequence/
Custom videos in the Kinetics format
To produce data in the same format at the Kinetics data, use the kinetics.py script.
To generate the dataset you must have Tensorflow installed. Then the media_sequence_demo binary must be built from the top directory in the mediapipe repo and the command to build the data set must be run from the same directory.
echo "Credit for this video belongs to: ESA/Hubble; Music: Johan B. Monell"
wget https://cdn.spacetelescope.org/archives/videos/medium_podcast/heic1608c.mp4 -O /tmp/heic1608c.mp4
CUSTOM_CSV=/tmp/custom_kinetics.csv
VIDEO_PATH=/tmp/heic1608c.mp4
echo -e "video,time_start,time_end,split\n${VIDEO_PATH},0,10,custom" > ${CUSTOM_CSV}
bazel build -c opt mediapipe/examples/desktop/media_sequence:media_sequence_demo \
--define MEDIAPIPE_DISABLE_GPU=1
python -m mediapipe.examples.desktop.media_sequence.kinetics_dataset \
--alsologtostderr \
--splits_to_process=custom \
--path_to_custom_csv=${CUSTOM_CSV} \
--video_path_format_string={video} \
--path_to_kinetics_data=/tmp/ms/kinetics/ \
--path_to_mediapipe_binary=bazel-bin/mediapipe/examples/desktop/\
media_sequence/media_sequence_demo \
--path_to_graph_directory=mediapipe/graphs/media_sequence/