Foundational Research and Dataset Lineage

The entire Ethan AI system is based on the Ethan AI Behavioral Events Dataset (E-BED), a dataset constructed with video + audio. E-BED provides annotated behavioral events captured during natural daily activities across home, classroom, and therapy settings. This dataset was built on top of existing research in this area.

The Self-Stimulatory Behaviours Dataset (SSBD) introduced by Rajagopalan et al. (2013) served as our founding model dataset for behavior analysis. SSBD consists of 66 publicly available videos curated from platforms such as YouTube, Vimeo, and Dailymotion, each averaging approximately 90 seconds and recorded in uncontrolled, “in-the-wild” settings. Building on SSBD, we derived a focused dataset for hand flapping detection by segmenting videos into short clips (2–4 seconds), labeling them as Hand Flapping or Normal, This was introduced as a part of ETHAN AI – E BED dataset.

ARRBD dataset, contains 780 clips across 10 behavioral categories, expanded behavioral coverage had an increased the risk of overfitting. Task specific datasets are derived from this dataset and was added as a part of E BED. The recall accuracy showed a marked jump from 40% to 60% across critical behaviors on this improved dataset. 

E BED leverages generative AI to introduce label preserving variation. In E BED, each video is annotated with precise event onset and offset times, event categories, and modality indicators, enabling both event classification and event detection tasks. In addition to overt behaviors such as repetitive motor actions and aggressive episodes, the dataset explicitly includes neutral and pre event segments, allowing the study of early escalation cues and reducing bias toward extreme behaviors.

Dataset Source Scale Annotation Style Availability
SSBD ( 2013) Public online videos 66 videos Clip-level behavior labels Public
ARRBD ( 2023) Curated research corpus 780 clips Multi-class behavior labels Public
EBD(2025) Expanded SSBD variants Varies Refined / extended labels Limited
E-BED ( 2026)  Ethan AI pilots + derived data Ongoing Temporal events, phases, multimodal Internal

E-BED is constructed using consented data from Ethan AI pilot deployments across therapy centers and individual homes. This dataset is supplemented by derived public sources where appropriate. Due to sensitive behavioral contexts, E-BED is not publicly available. Dataset scale and raw samples are not disclosed and are used exclusively for internal training, validation, and longitudinal evaluation under strict governance and privacy controls.

If you are a fellow researcher working in the fields of autism, autism diagnosis, behavioral intelligence and assistive care do reach out at hello@ethanai.in

From Bag of Visual Words to Time Escalation Modeling

Rajagopalan et al. introduced a approach to analyzing self-stimulatory behaviors in Autism Spectrum Disorder (ASD) using a Bag of Visual Words (BoW) framework. In this method, videos are broken into frames, visual features (like motion or local image descriptors) are extracted, and these features are clustered to form a “visual vocabulary.” Each video is then represented as a histogram counting how often each visual pattern appears. This enables multi-class classification of behaviors using models such as SVMs. While effective for detecting repetitive patterns, BoW ignores temporal order. It captures how often something happens but not how it evolves over time.

This is where at Ethan AI we move beyond this limitation, adapting to spatio-temporal methodologies. We model both spatial structure (what the body is doing in each frame) and temporal dynamics (how that behavior changes across seconds). Using protocols such as 5-fold cross-validation under a Leave-One-Group-Out (LOGO) setting ensures subject-independent evaluation, where individuals in training and testing sets are mutually exclusive. Performance is typically reported using the weighted F1-score to account for class imbalance in multi-class tasks. 

We look for pointers like “a behavior intensified over 20 seconds”. Detecting sustained upward trends in measurable signals (e.g., motion energy, joint velocity, repetition rate) across consecutive time windows relative to an individual’s baseline. Instead of treating behaviors as unordered counts, modern approaches analyze trajectories, thus capturing whether movement becomes faster, larger, or more erratic over time. This shift transforms the problem from simple action recognition to behavioral progression analysis, enabling ETHAN AI to move toward early detection of escalation rather than post-hoc classification of isolated events.