build_timeseries[source]

build_timeseries(df, features, timeslice_length, window_step_size=1, ignore_label='not_defined')

Split the data in df into timeseries segments of length timeslice_length containing given features.

Let's assume you have some data that has been labelled using the label_assistant module. For the sake of this example, we will just use some dummy data and set some dummy labels manually.

df = pd.read_hdf('example_classified_data/labelled_behaviors.h5')
df.iloc[22:38].loc[:, ['behavior']] = 'foobar'
df.iloc[81:99].loc[:, ['behavior']] = 'baz'
df
bodyparts head beak left_neck right_neck ... right_down_wing body tail file_name frame behavior
coords x y likelihood x y likelihood x y likelihood x ... likelihood x y likelihood x y likelihood
0 773.376465 231.518768 0.999999 726.495178 235.638046 0.999981 726.502014 277.634125 0.999998 803.271179 ... 0.999997 804.008545 350.669586 0.999992 874.878601 485.749908 0.999999 coordinates.h5 0 not_defined
1 773.129822 231.487213 0.999999 725.662231 235.242844 0.999951 725.964478 278.003082 0.999999 803.197144 ... 0.999989 802.684265 345.021454 0.999873 875.375854 487.185547 0.999997 coordinates.h5 1 not_defined
2 773.009827 231.793518 0.999999 726.025696 235.272522 0.999978 725.764893 278.884918 0.999998 802.567810 ... 0.999995 801.531067 349.937347 0.999946 876.269714 485.816010 0.999999 coordinates.h5 2 pecking
3 773.748779 231.791260 0.999999 726.288940 235.864319 0.999985 725.889465 279.045715 0.999998 803.356934 ... 0.999994 802.792908 350.675842 0.999970 875.973022 485.560150 0.999998 coordinates.h5 3 pecking
4 774.934326 231.623734 0.999999 726.298279 235.749908 0.999990 726.302551 278.388367 0.999999 802.530273 ... 0.999998 803.659973 351.269745 0.999938 876.481873 485.140839 0.999998 coordinates.h5 4 pecking
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
95 691.788513 232.490265 1.000000 673.796082 238.801743 0.018886 697.399841 282.134796 0.999998 737.725342 ... 0.999980 788.017456 337.912994 0.999999 882.997253 483.786896 1.000000 coordinates2.h5 95 not_defined
96 691.545410 232.707413 1.000000 673.634888 238.658234 0.016135 697.256165 283.058899 0.999999 736.505920 ... 0.999943 788.334045 339.911743 0.999999 884.470215 483.485382 1.000000 coordinates2.h5 96 not_defined
97 691.117371 232.242767 1.000000 673.748840 239.055954 0.007289 696.269043 282.351929 0.999999 735.976685 ... 0.999916 785.626465 338.561829 0.999997 885.270691 485.053131 0.999999 coordinates2.h5 97 not_defined
98 691.294067 232.225220 1.000000 673.927002 239.141891 0.004682 695.629456 282.407013 1.000000 735.639404 ... 0.999876 786.011963 338.520691 0.999997 885.585388 484.755859 0.999999 coordinates2.h5 98 not_defined
99 691.483643 232.269226 1.000000 673.797241 239.390625 0.010126 695.367371 281.720947 0.999999 735.199585 ... 0.999786 785.282776 338.077087 0.999992 885.361023 483.480896 1.000000 coordinates2.h5 99 not_defined

200 rows x 39 columns

The build_timeseries function can be used to prepare the data for Keras. By defining the window_step_size it is possible to model a sliding (default, step_size = 1), hopping (1 < step_size < timeslice_length), or a tumbling (step size < timeslice_length) window.

features = [('head', 'x'), ('head', 'y'), ('tail', 'x')]
timeslice_length = 20
segmented_timeseries, label_vector = build_timeseries(df, features, timeslice_length, window_step_size=timeslice_length)

result_shape = segmented_timeseries.shape
test_eq(result_shape[0], 2)
test_eq(result_shape[1], 3)
test_eq(result_shape[2], timeslice_length)

As an alternative, we can build segments using a sliding window (which is used as the default).

features = [('head', 'x'), ('head', 'y'), ('tail', 'x')]
timeslice_length = 7
sliding_segmented_timeseries, sliding_label_vector = build_timeseries(df, features, timeslice_length)

result_shape = sliding_segmented_timeseries.shape
test_eq(result_shape[0], 43)
test_eq(result_shape[1], 3)
test_eq(result_shape[2], timeslice_length)

# segments that are indentified as `not_defined` are not included
test_eq(np.isin("not_defined", sliding_label_vector), False)