Timeseries Preperation

`build_timeseries`[source]

build_timeseries(df, features, timeslice_length, window_step_size=1, ignore_label='not_defined')

Split the data in df into timeseries segments of length timeslice_length containing given features.

Let's assume you have some data that has been labelled using the label_assistant module. For the sake of this example, we will just use some dummy data and set some dummy labels manually.

df = pd.read_hdf('example_classified_data/labelled_behaviors.h5')
df.iloc[22:38].loc[:, ['behavior']] = 'foobar'
df.iloc[81:99].loc[:, ['behavior']] = 'baz'
df

bodyparts	head			beak			left_neck			right_neck	...	right_down_wing	body			tail			file_name	frame	behavior
coords	x	y	likelihood	x	y	likelihood	x	y	likelihood	x	...	likelihood	x	y	likelihood	x	y	likelihood
0	773.376465	231.518768	0.999999	726.495178	235.638046	0.999981	726.502014	277.634125	0.999998	803.271179	...	0.999997	804.008545	350.669586	0.999992	874.878601	485.749908	0.999999	coordinates.h5	0	not_defined
1	773.129822	231.487213	0.999999	725.662231	235.242844	0.999951	725.964478	278.003082	0.999999	803.197144	...	0.999989	802.684265	345.021454	0.999873	875.375854	487.185547	0.999997	coordinates.h5	1	not_defined
2	773.009827	231.793518	0.999999	726.025696	235.272522	0.999978	725.764893	278.884918	0.999998	802.567810	...	0.999995	801.531067	349.937347	0.999946	876.269714	485.816010	0.999999	coordinates.h5	2	pecking
3	773.748779	231.791260	0.999999	726.288940	235.864319	0.999985	725.889465	279.045715	0.999998	803.356934	...	0.999994	802.792908	350.675842	0.999970	875.973022	485.560150	0.999998	coordinates.h5	3	pecking
4	774.934326	231.623734	0.999999	726.298279	235.749908	0.999990	726.302551	278.388367	0.999999	802.530273	...	0.999998	803.659973	351.269745	0.999938	876.481873	485.140839	0.999998	coordinates.h5	4	pecking
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
95	691.788513	232.490265	1.000000	673.796082	238.801743	0.018886	697.399841	282.134796	0.999998	737.725342	...	0.999980	788.017456	337.912994	0.999999	882.997253	483.786896	1.000000	coordinates2.h5	95	not_defined
96	691.545410	232.707413	1.000000	673.634888	238.658234	0.016135	697.256165	283.058899	0.999999	736.505920	...	0.999943	788.334045	339.911743	0.999999	884.470215	483.485382	1.000000	coordinates2.h5	96	not_defined
97	691.117371	232.242767	1.000000	673.748840	239.055954	0.007289	696.269043	282.351929	0.999999	735.976685	...	0.999916	785.626465	338.561829	0.999997	885.270691	485.053131	0.999999	coordinates2.h5	97	not_defined
98	691.294067	232.225220	1.000000	673.927002	239.141891	0.004682	695.629456	282.407013	1.000000	735.639404	...	0.999876	786.011963	338.520691	0.999997	885.585388	484.755859	0.999999	coordinates2.h5	98	not_defined
99	691.483643	232.269226	1.000000	673.797241	239.390625	0.010126	695.367371	281.720947	0.999999	735.199585	...	0.999786	785.282776	338.077087	0.999992	885.361023	483.480896	1.000000	coordinates2.h5	99	not_defined

200 rows x 39 columns

The build_timeseries function can be used to prepare the data for Keras. By defining the window_step_size it is possible to model a sliding (default, step_size = 1), hopping (1 < step_size < timeslice_length), or a tumbling (step size < timeslice_length) window.

features = [('head', 'x'), ('head', 'y'), ('tail', 'x')]
timeslice_length = 20
segmented_timeseries, label_vector = build_timeseries(df, features, timeslice_length, window_step_size=timeslice_length)

result_shape = segmented_timeseries.shape
test_eq(result_shape[0], 2)
test_eq(result_shape[1], 3)
test_eq(result_shape[2], timeslice_length)

As an alternative, we can build segments using a sliding window (which is used as the default).

features = [('head', 'x'), ('head', 'y'), ('tail', 'x')]
timeslice_length = 7
sliding_segmented_timeseries, sliding_label_vector = build_timeseries(df, features, timeslice_length)

result_shape = sliding_segmented_timeseries.shape
test_eq(result_shape[0], 43)
test_eq(result_shape[1], 3)
test_eq(result_shape[2], timeslice_length)

# segments that are indentified as `not_defined` are not included
test_eq(np.isin("not_defined", sliding_label_vector), False)

Timeseries Preperation

build_timeseries[source]

`build_timeseries`[source]