Foundational Research and Dataset Lineage
The entire Ethan AI system is based on the Ethan AI Behavioral Events Dataset (E-BED), a dataset constructed with video + audio. E-BED provides annotated behavioral events captured during natural daily activities across home, classroom, and therapy settings. This dataset was built on top of existing research in this area.
The Self-Stimulatory Behaviours Dataset (SSBD) introduced by Rajagopalan et al. (2013) served as our founding model dataset for behavior analysis. SSBD consists of 66 publicly available videos curated from platforms such as YouTube, Vimeo, and Dailymotion, each averaging approximately 90 seconds and recorded in uncontrolled, “in-the-wild” settings. Building on SSBD, we derived a focused dataset for hand flapping detection by segmenting videos into short clips (2–4 seconds), labeling them as Hand Flapping or Normal, This was introduced as a part of ETHAN AI – E BED dataset.
ARRBD dataset, contains 780 clips across 10 behavioral categories, expanded behavioral coverage had an increased the risk of overfitting. Task specific datasets are derived from this dataset and was added as a part of E BED. The recall accuracy showed a marked jump from 40% to 60% across critical behaviors on this improved dataset.
E BED leverages generative AI to introduce label preserving variation. In E BED, each video is annotated with precise event onset and offset times, event categories, and modality indicators, enabling both event classification and event detection tasks. In addition to overt behaviors such as repetitive motor actions and aggressive episodes, the dataset explicitly includes neutral and pre event segments, allowing the study of early escalation cues and reducing bias toward extreme behaviors.
| Dataset | Source | Scale | Annotation Style | Availability |
|---|---|---|---|---|
| SSBD ( 2013) | Public online videos | 66 videos | Clip-level behavior labels | Public |
| ARRBD ( 2023) | Curated research corpus | 780 clips | Multi-class behavior labels | Public |
| EBD(2025) | Expanded SSBD variants | Varies | Refined / extended labels | Limited |
| E-BED ( 2026) | Ethan AI pilots + derived data | Ongoing | Temporal events, phases, multimodal | Internal |
E-BED is constructed using consented data from Ethan AI pilot deployments across therapy centers and individual homes. This dataset is supplemented by derived public sources where appropriate. Due to sensitive behavioral contexts, E-BED is not publicly available. Dataset scale and raw samples are not disclosed and are used exclusively for internal training, validation, and longitudinal evaluation under strict governance and privacy controls.
If you are a fellow researcher working in the fields of autism, autism diagnosis, behavioral intelligence and assistive care do reach out at hello@ethanai.in