Foundational Research and Dataset Lineage

The entire Ethan AI system is based on the Ethan AI Behavioral Events Dataset (E-BED), a dataset constructed with video + audio. E-BED provides annotated behavioral events captured during natural daily activities across home, classroom, and therapy settings. This dataset was built on top of existing research in this area.

The Self-Stimulatory Behaviours Dataset (SSBD) introduced by Rajagopalan et al. (2013) served as our founding model dataset for behavior analysis. SSBD consists of 66 publicly available videos curated from platforms such as YouTube, Vimeo, and Dailymotion, each averaging approximately 90 seconds and recorded in uncontrolled, “in-the-wild” settings. Building on SSBD, we derived a focused dataset for hand flapping detection by segmenting videos into short clips (2–4 seconds), labeling them as Hand Flapping or Normal, This was introduced as a part of ETHAN AI – E BED dataset.

ARRBD dataset, contains 780 clips across 10 behavioral categories, expanded behavioral coverage had an increased the risk of overfitting. Task specific datasets are derived from this dataset and was added as a part of E BED. The recall accuracy showed a marked jump from 40% to 60% across critical behaviors on this improved dataset. 

E BED leverages generative AI to introduce label preserving variation. In E BED, each video is annotated with precise event onset and offset times, event categories, and modality indicators, enabling both event classification and event detection tasks. In addition to overt behaviors such as repetitive motor actions and aggressive episodes, the dataset explicitly includes neutral and pre event segments, allowing the study of early escalation cues and reducing bias toward extreme behaviors.

Dataset Source Scale Annotation Style Availability
SSBD ( 2013) Public online videos 66 videos Clip-level behavior labels Public
ARRBD ( 2023) Curated research corpus 780 clips Multi-class behavior labels Public
EBD(2025) Expanded SSBD variants Varies Refined / extended labels Limited
E-BED ( 2026)  Ethan AI pilots + derived data Ongoing Temporal events, phases, multimodal Internal

E-BED is constructed using consented data from Ethan AI pilot deployments across therapy centers and individual homes. This dataset is supplemented by derived public sources where appropriate. Due to sensitive behavioral contexts, E-BED is not publicly available. Dataset scale and raw samples are not disclosed and are used exclusively for internal training, validation, and longitudinal evaluation under strict governance and privacy controls.

If you are a fellow researcher working in the fields of autism, autism diagnosis, behavioral intelligence and assistive care do reach out at hello@ethanai.in