## A synthetic intelligence strategy for choosing efficient instructor communication methods in autism training

### Data collection

A data set was formed through structured classroom observations in 20 full-day sessions over 5 months in 2019 at a special school with criteria of ASC for admission in East London. Participants included three teachers (one male, two females), their teaching assistants (all females), and seven children (four males, three females) aged from 6 to 12 years across 3 classes. The children’s P-scales range from P3 to P6; P-scale commonly ranges from P1 to P8, with P1–P3 being developmental non-subject-specific levels, and with P4–P8 corresponding to expected levels for typical development at ages 5–648. In addition, the children are also described as social or language partners on the SCERTS scale used by the school. In our study, none of the participating students were classified as conversational partners. The attributes of the student cohort are presented in Supplementary Table 3.

A coding protocol was developed through an iterative process with the participating teachers, and a grid was used for recording teacher-student interaction observations. Comments and suggestions from the teachers were taken into consideration and reflected throughout the multiple revised drafts and the final versions of the coding protocol and recording grid. For each observation instance, we recorded the student identifier, time stamp, teaching objective, teaching type, the context for this teaching type, the student’s observed emotional state, teacher’s communication strategy, and the corresponding student response (outcome). Where applicable we also recorded additional notes and the type of activity (e.g. yoga). Although notes were used for context and interpretation for the data analysis as a whole, they were not included in our machine learning function experiments given their free-form inconsistency. Table 1 details all the subcategories that were considered as inputs to the machine learning models. Up to two teaching types and teacher communications could be attributed to a single observation; the rest of the categories can only be represented by one subtype. For example, an observation coded as “3, academic, giving instruction/modelling, whole class, positive, verbal/gesture, full response” (the time stamp is omitted) represents that student no. “3”, being in a positive emotional state, fully responded to a teacher’s verbal and gesture instruction, when teaching was taking place in a whole class environment, its type was modelling and had an overall academic objective. This may refer to an interaction instance where the teacher is delivering a yoga lesson to the whole class: the teacher is demonstrating a yoga move by gesturing while verbally explaining it and asking the students to do the same; the student then responds by doing the move with an observably happy expression.

All observed adult-student interactions during the school day, permitted by the teachers, were recorded. The aim was to rapidly record situation-strategy-outcome data points “in vivo” inside and outside the classroom. Locations of the observations outside the classroom include the playground, library, music room, main hall, canteen, therapy rooms, and garden. Overall, these resources were regularly used throughout the observational sessions. The instances recorded for each student vary slightly from 753 to 880 (μ = 780, σ = 45) and in total a sample of 5460 full observations were collected.

### Statistical characterisation of collected data

From the 5460 observations we collected, only 5001 are distinct. If we ignore the student’s response, unique observations are reduced to 4880, and if we also ignore the teacher’s communication strategy, then this number becomes 4357. Hence, there are instances in our data that are overlapping, but this is expected given that teachers and students may perform similarly throughout a specific teaching session. The level of support for each teacher communication strategy is equal to 3128 (709) times for a verbal communication, 1717 (357) for using an object, 1642 (181) for the gesture, 1465 (575) for a physical prompt, and 981 (165) for a picture, where in parentheses we report the number of times the underpinned communication was the only one performed (from a maximum of two communications). Although the small student and teacher sample does not allow for generalisations, we see that teachers tend to verbally engage with students quite frequently (57.29%), either in combination with another communication or as the sole means of communication. The full student response rate for each communication strategy (irrespectively of co-occurrence with another one) is equal to 64.02% (64.90%, 60.68%) for picture, 60.92% (62.48%, 57.73%) for an object, 60.61% (64.34%, 53.56%) for a physical prompt, 57.67% (59.67%, 51.80%) for a gesture, and 53.20% (55.21%, 46.45%) for a verbal communication; the rates in the parentheses are breakdowns for the language and social partner SCERTS classifications, respectively, reaffirming those language partners are in general more responsive, with a more pronounced relative difference when verbal or physical prompts are deployed. In addition, performing two versus one communication is more effective in producing a full student response. In particular, the full, partial, and no response breakdowns for single communications are 50.58%, 21.84%, and 27.58%, compared to 60.01%, 21.82%, and 18.17% for two teacher communications. Although the presence of two communications naturally increases the probability of choosing the correct means of interaction, the current outcome reaffirms the hypothesis that an incorrect communication strategy does not greatly affect the student when a desirable one co-occurs. The observed features with the greatest bivariate correlation with the student response are the negative emotional state of the student (r = −0.184, p ≪ 0.001), the encouragement/praise teaching type (r = 0.124, p ≪ 0.001), and the redirection teaching type (r = −0.124, p ≪ 0.001).

### Student response (outcome) classification with machine learning

A machine learning classification task aims to learn a function f: X → y, where ({{{bf{X}}}}in {{mathbb{R}}}^{mtimes n}), y ∈ {1, …, k}m denote the observations (inputs) and the response variable (outcomes), respectively; m, n, k represent the numbers of observations and outcomes, observation categories (features), and outcome classes, respectively. Here, in the most feature-inclusive case, we define X as an aggregation of six feature categories, namely student attributes (age, sex, P-level, SCERTS classification), teaching objective, teaching type, context for teaching type, the student’s observed emotional state, and teacher’s communication strategy. All feature categories, apart from age, were coded as c-dimensional tuples of 1s and 0s, where c is the respective number of different subtypes for each category (Table 1), and ones are used to denote the activated subtype(s). Student age was coded as a real number from 0 to 1, using a linear mapping scheme, where 0 and 1 represent 5 and 12 years of age, respectively. The response variable y takes a binary definition representing two classes, a full response output versus otherwise. The rational behind this merging was to generate a more balanced classification task (56.59% full student response labels) as well as alleviate any issues arising from a miscategorisation of partial (21.86%) or no response (21.55%) outcomes.

We train and evaluate the performance of various machine learning functions in predicting the student’s type of response. We deploy three broadly used classifiers in the literature: (a) a variant of logistic regression (LR)55 that uses elastic net regularisation56 for feature selection, (b) a random forest (RF)57 with 2000 decision trees, and (c) a Gaussian Process (GP)58 with a composite covariance function (or kernel) that we describe below. We devise three problem formulations, where we incrementally add more elements in the observed data (input). In the first instance, we consider all observed categories apart from student attributes. Then, we include student attributes as part of the feature space and, to represent this change, augment method abbreviations with “-α”. Finally, in both previous setups, we explore autoregression by including the observed data and student responses for up to the previous τ = 5 teacher-student interactions. While performing autoregression, we maintain all three types of recorded student responses in the input data.

Although logistic regression and random forests treat the increased input space without any particular intrinsic additive modelling, the modularity of the GP allows us to specify more customised covariance functions on these different inputs. GP models assume that f: X → y is a probability distribution over functions denoted as (f({{{bf{x}}}}) sim ,{{mbox{GP}}},(mu ({{{bf{x}}}}),k({{{bf{x}}}},{{{bf{x}}}}^{prime} ))), where ({{{bf{x}}}},{{{bf{x}}}}^{prime}) are rows of X, μ(⋅) is the mean function of the process, and k(⋅,⋅) is the covariance function (or kernel) that captures statistical relationships in the input space. We assume that μ(x) = 0, a common setting for various downstream applications59,60,61,62, and use the following incremental (through summation) covariance functions:

$$k({{{bf{x}}}},{{{bf{x}}}}^{prime} )={k}_{{{{rm{SE}}}}}({{{{bf{x}}}}}_{c},{{{{bf{x}}}}}_{c}^{prime}) ,$$

(1)

$$k({{{bf{x}}}},{{{bf{x}}}}^{prime} )={k}_{{{{rm{SE}}}}}({{{bf{a}}}},{{{bf{a}}}}^{prime} )+{k}_{{{{rm{SE}}}}}({{{{bf{x}}}}}_{c},{{{{bf{x}}}}}_{c}^{prime}) ,$$

(2)

$$k({{{bf{x}}}},{{{bf{x}}}}^{prime} )={k}_{{{{rm{SE}}}}}({{{{bf{x}}}}}_{c},{{{{bf{x}}}}}_{c}^{prime})+{k}_{{{{rm{SE}}}}}({{{{bf{x}}}}}_{p},{{{{bf{x}}}}}_{p}^{prime})+{k}_{{{{rm{SE}}}}}({{{{bf{y}}}}}_{p},{{{{bf{y}}}}}_{p}^{prime}) ,,{{mbox{and}}},$$

(3)

$$k({{{bf{x}}}},{{{bf{x}}}}^{prime} )={k}_{{{{rm{SE}}}}}({{{bf{a}}}},{{{bf{a}}}}^{prime} )+{k}_{{{{rm{SE}}}}}({{{{bf{x}}}}}_{c},{{{{bf{x}}}}}_{c}^{prime})+{k}_{{{{rm{SE}}}}}({{{{bf{x}}}}}_{p},{{{{bf{x}}}}}_{p}^{prime})+{k}_{{{{rm{SE}}}}}({{{{bf{y}}}}}_{p},{{{{bf{y}}}}}_{p}^{prime}) ,$$

(4)

where kSE(⋅,⋅) denotes the squared exponential covariance function, xc denotes the current observation including the teacher’s communication strategy, a is the vector containing student attributes, and xp, yp denote the τ past observations and student response outcomes, respectively. Therefore, Eq. (1) refers to the kernel in the simplest task formulation where only currently observed data are used, Eq. (2) expands on Eq. (1) by adding a kernel for student attributes, and Eqs. (3) and (4) add kernels for including previous observations and student responses (autoregression). Using an additive problem formulation, where a kernel focuses on a part of the feature space, generates a simpler optimisation task and tends to provide better accuracy63. This is also confirmed by our empirical results.

### Training and evaluating classifiers

We apply 10-fold cross-validation as follows. We randomly shuffle the observed samples (5460 in total) and then generate 10 equally sized folds. We use 9 of these folds to train a model, and 1 to test, repeating this training-testing process 10 times, using all formed folds as test sets. By doing this we are solving a task, whereby observations from the same student can exist in both the training and the test sets (although these observations are strictly distinct). That was an essential compromise here given the limited number of different students (7). The same exact training and testing process (and identical data splits) is used for all classification models and problem formulations. We learn the regularisation hyperparameters of logistic regression by cross-validating on the training data; this may result in potentially different choices for each fold. The hyperparameters of the GP models are learned using the Laplace approximation58,64. Performance is assessed using standard classification metrics, and in particular accuracy, precision, recall, and their harmonic mean known as the F1 score. For completeness, we also assess the best-performing model by testing on data from a single student that is not included in the training set, repeating the same process for all students in our cohort (leave-one-student-out, 7-fold cross-validation; see SI for more details).

### Ethics approval

Ethical approval was granted by the Research Ethics Committee at the Institute of Education, University College London (United Kingdom), where the research was conducted. The parents/guardians of the participating children, the school management, and their teachers gave their written informed consent. All participant information has been anonymised. Raw data and derived data sets were securely stored on the researchers’ encrypted computer systems with password protection.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.