This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/4.0/) You are free to copy, distribute and transmit the work, provided the original author and source are credited.

Statistical learning starts at an early age and is intimately linked to brain development and the emergence of individuality. Through such a long period of statistical learning, the brain updates and constructs statistical models, with the model's individuality changing based on the type and degree of stimulation received. However, the detailed mechanisms underlying this process are unknown. This paper argues three main points of statistical learning, including 1) cognitive individuality based on "

Understanding cognitive individuality and its underlying creativity is crucial for advancing our understanding of human cognition. One critical cognitive function that contributes to language and music acquisition is known as “statistical learning” (Saffran et al., 1996[

Recent studies have suggested that individual differences in statistical learning is linked to various cognitive abilities and developmental disorders such as autism spectrum disorder (ASD) and developmental dyslexia (Misyak and Christiansen, 2012[

Here, we review neural and computational studies on how cognitive individuality emerges through statistical learning in the brain. Further, for constructive understanding, we conducted a simulation experiment to visualize the temporal dynamics of perception and production processes through statistical learning in different cognitive models. We utilized three models that have varying levels of sensitivity to sound stimuli: hypo-sensitive, normal-sensitive, and hyper-sensitive models. Considering that statistical learning is fundamental to brain development, we also discuss how typical versus atypical brain development influences the perception and production of information through statistical learning.

Recently, a growing body of studies has tried to explain the neural and computational mechanisms of learning and generation of auditory structured information (such as music and language) based on the general principle of predictive processing in the brain (Vuust et al., 2022[

Researchers have attempted to understand cognitive individuality from the perspective of predictive processing in the brain. For example, it has been explained by the dependence on top-down predictions based on the prior probability of internal models (hypo-/hyper-prior) and the dependence on bottom-up sensory signals from the external environment (hypo-/hyper-sensitive) (Pellicano and Burr, 2012[

The statistical learning is an essential cognitive function that is closely linked to brain development (Saffran, 2018[_{n+1}, given the preceding n events based on Bayes' theorem: _{n+1}_{n}

−ΣP(x_{i})ΣP(x_{i+1}|x_{i})log_{2}P(x_{i+1}|x_{i})

From the psychological standpoint, the formula can be construed as positing that the brain expects a forthcoming event e_{n+1} based on the most recent preceding events e_{n} in a given sequence.

The prediction strategy and resulting impression can vary depending on uncertainty, even when the transition probabilities are identical. For example, a recent neural study has revealed that the brain strategically alters the “

Such uncertainty is not universally inherent in music of language per se. Rather, it is “perceptual” uncertainty that is shaped by an individual's auditory experience. For example, in the case of language, when a native speaker hears a particular word, the uncertainty in predicting the probable subsequent words is low, making prediction easier. Conversely, for non-native speakers, predicting the next word is more difficult due to higher uncertainty. This is a result of individuals constantly updating their internal models through extended periods of statistical learning, thereby generating an appropriate language probability model. Neural and behavioral studies have highlighted the impact of individual's auditory experience and expertise on the statistical learning abilities (Daikoku and Yumoto, 2020[

Importantly, auditory experience affects not only perceptual uncertainty, but also the "reliability" of probabilities. For instance, an A-to-B transition which occurs in (1) 9 out of 10 trials and in (2) 90 out of 100 trials both have a transition probability of 90 %. However, the degree of reliability is higher in the former case than the latter. Such reliability is useful for the brain to make judgments even for events with low transition probability. Comparing an event that occurs in 10 out of 100 trials with an event that occurs in 1 out of 10, the brain will recognize that the former is reliably unpredictable and confidently uses this information to make predictions.

Neurophysiological studies have observed a gradual representation of statistical learning effects as the number of learning repetitions increases (Daikoku et al., 2015[

However, most studies of statistical learning have referred maximum likelihood estimation based on Markov models or n-gram models that do not consider the "reliability" of probabilities, and thus, have not taken into account the effect of learning trial. Therefore, this study developed a novel model, referred to as a “Hierarchical Bayesian Statistical Learning (HBSL)” model incorporating the Bayesian reliability of probabilities into a Markov model. We then used this model to examine the learning process when a specific auditory stimulus sequence is repetitively learned.

It is of note that the reliability of probabilities is not only subject to the amount of learning (experience), but also to prediction biases. As mentioned above (section

Statistical learning has basically been derived from a hypothesis that explains the mechanism of chunking, which detects information units with high transition probabilities from sequential information such as words or phrases (Saffran et al., 1996[

Particularly, the hierarchical structure of auditory rhythms has been considered important for the acquisition of music and language (Goswami, 2017[

It is known that human auditory perception relies in part on phase entrainment of the AM rhythm patterns in sounds at different timescales simultaneously. Such a phase entrainment (also described as phase alignment, neural coupling, tracking, and synchronization) has been shown to contribute to parsing of the sound signal into units such as syllables and words (Poeppel, 2003[

A recent study has shown that the acquisition of the slower rhythm (1-3 Hz), that is, phase entrainment of 1-3Hz rhythm (Attaheri et al., 2022[

However, brain development can interfere with this function of phase entrainment through statistical learning (Smalle et al., 2022[

To provide a constructive understanding of the potential relationships between statistical learning and 1-3Hz rhythm acquisitions, in the next section, we conduct a simulation experiment to visualize the temporal dynamics of perception and production processes through statistical learning, using a newly devised model referred to as the HBSL model with different dependence or reliability on bottom-up sensory stimuli relative to top-down prior prediction: hypo-sensitive, normal-sensitive, and hyper-sensitive models that takes into account both reliability and hierarchy, mimicking the statistical learning processes of the brains with different cognitive individuality. Then, we discuss how atypical cognitive development and individuality (i.e., hypo- and hyper-sensitive) influence the perception and production through statistical learning.

This study developed a computational model, which simulates statistical learning processes of the brain, referred to as HSBL model (Daikoku and Nagai, 2022[

Hypo-sensitive:

Normal-sensitive:

Hyper-sensitive:

where each α_i corresponds to the prior probability of category i. Specifically, for K categories, α is a K-dimensional vector of positive real numbers. Although the degree of updating transition probabilities remains constant among the three models, differences emerge in terms of the changes in the reliability of transition probability. In the hyper-sensitive model, the reliability of probabilities (variance of prior distribution) varies easily depending on the sensory input, while in the hypo-sensitive model, the reliability of probabilities is less likely to vary even when a new input is provided (eight times weaker than hyper-sensitive model). The normal-sensitive model is an intermediate model between the hyper- and hypo-sensitive models in terms of sensitivity to sensory input.

We generated fifteen different models by manipulating the degree of dependence on sensory signals and the amount of learning. We used the MIDI data of the Japanese children's song "Yuuyake Koyake" as the training data, and repeated the learning of the song one to five times using each of the three models (hypo-, normal-, and hyper-sensitive). As a result, a total of fifteen models were generated, consisting of three degrees of dependence on sensory signals and five amounts of learning. We investigate how each of the hypo-, normal-, and hyper-sensitive model transforms the internal model over five trials of learning. Furthermore, using the probability distribution of these fifteen models, a hundred pieces of music were probabilistically generated for each model through an automatic composition process (Daikoku and Nagai, 2022[

We compared the total Bayesian surprise (or total prediction errors) that occurred during learning, measured by the Kullback-Leibler divergence between a distribution P(x) before learning an event (e_{n}) and a distribution Q(x) after learning the event (e_{n+1}), as well as the total number of chunks generated during 5 trials of statistical learning. The Kullback-Leibler divergence has often been used to measure prediction error or Bayesian surprise in the framework of predictive processing of the brain (Friston, 2010[

_{KL}

Here, P(i) and Q(i) represent the probabilities of selecting the value i according to the probability distributions P and Q, respectively. In addition, we calculated the average probability distribution of the 100 songs generated by each model and compared the similarity of the models to the training data (i.e., original data) using t-distributed stochastic neighbor embedding (tSNE).

We converted the MIDI data of the 100 songs generated by each model into WAV format and extracted the rhythm waveform (modulation wave) below 15 Hz using the Bayesian probabilistic amplitude modulation model (PAD, (Turner and Sahani, 2011[

_{t }_{t * }_{t}

PAD employs amplitude demodulation as a process of both learning and inference. Learning involves the estimation of parameters that describe distributional constraints, such as the expected timescale of variation of the modulator. Inference involves estimating the modulator and carrier from the signals based on learned or manually defined parametric distributional constraints. This information is probabilistically encoded in the likelihood function _{1:T}_{1:T}_{1:T}_{1:T}_{1:T}_{1:T} represents all the samples of the signal x, ranging from 1 to a maximum value T. Each of these distributions depends on a set of parameters θ, which control factors such as the typical timescale of variation of the modulator or the frequency content of the carrier. In more detail, the parametrized joint probability of the signal, carrier, and modulator is:

_{1:T}_{1:T}_{1:T}_{1:T}_{1:T}_{1:T}_{1:T}_{1:T}

Bayes' theorem is applied for inference, forming the posterior distribution over the modulators and carriers, given the signal:

P(c1:T, m1:T|y1:T, θ) = P(y1:T, c1:T, m1:T|θ) / P(y1:T|θ)

The full solution to PAD is a distribution over the possible pairs of modulators and carriers. The most probable pair of modulator and carrier given the signal is returned:

m*1:T, c*1:T=argmax P(c1:T, m1:T|y1:T, θ)

PAD utilizes Bayesian inference to estimate the most suitable modulator (i.e., envelope) and carrier that best align with the data and a priori assumptions. The resulting solution takes the form of a probability distribution, which describes the likelihood of a specific setting of modulator and carrier given the observed signal. Thus, PAD summarizes the posterior distribution by returning the specific envelope and carrier with the highest posterior probability, thereby providing the best fit to the data.

PAD can be run recursively using different demodulation parameters each time, producing a cascade of amplitude modulators at different oscillatory rates to form an AM. The positive slow envelope is modeled by applying an exponential nonlinear function to a stationary Gaussian process, resulting in a positive-valued envelope with a constant mean over time. The degree of correlation between points in the envelope can be constrained by the timescale parameters of variation of the modulator (i.e., envelope), which can either be manually entered or learned from the data.

In the present study, we manually entered the PAD parameters to produce the modulators at an oscillatory band level (i.e., <10 Hz) isolated from a carrier at a higher frequency rate (>10 Hz). The carrier reflects components, including noise and pitches, for which the frequencies are much higher than those of the core modulation bands. In each sample, the modulators (envelopes) were converted into time-frequency domains using scalogram (Figure 2

This study suggests that the Hypo-sensitive model had the highest total Bayesian surprise or total prediction error (i.e., Kullback-Liebler divergence) during learning, followed by Normal-sensitive model and Hyper-sensitive model (Figure 3

In terms of acoustic features of composed music after learning, both the Hyper-sensitive model and Normal-sensitive model showed a gradual increase in the 2 Hz rhythm, which corresponds to short phrases that are considered important in the initial learning of auditory sequences (such as music or language) (Figure 5). On the other hand, rhythms corresponding to notes or beats in the 3-5 Hz range gradually decreased with learning. In contrast, the Hypo-sensitive model showed a gradual decrease in the 2 Hz rhythm and a gradual increase in the 3-5 Hz rhythm with learning.

Regarding probability distribution, the tSNE analysis showed that the similarity of the probability distribution of the composed music to the original music was highest for the Hyper-sensitive model, followed by the Normal-sensitivity model and the Hypo-sensitive model, in that order (Figure 4

Statistical learning is a fundamental process for brain development and contributes to forming individual difference of perception and production (Siegelman et al., 2017[

In particular, the normal- and hyper-sensitive models gradually reduced Bayesian surprise, increased the number of chunks (Figure 3

On the other hand, the hypo-sensitive model produced music with statistically different characteristics from those of the training data (i.e., original music), compared to the other models (Figure 5

This study demonstrated that statistical learning contributes to the acquisition of rhythms around 1-3 Hz (Figure 5

A recent study has shown that the neural processing of the slower rhythm, that is, oscillatory phase entrainment of 1-3 Hz rhythm (Attaheri et al., 2022[

As stated in the Introduction section, two types of “

Past studies suggest that as the brain develops, neurotypical individuals transition from relying heavily on sensory input statistics while giving less weight to prior predictions (known as hypo-prior or hyper-sensitive) to properly integrating sensory statistics with prior predictions (Philippsen and Nagai, 2019[

Several studies have also indicated that children with developmental language disorders, including developmental dyslexia, which is defined by difficulties in reading, spelling, and impaired phonological processing (Ramus et al., 2003[

Such instability of reliance on prior prediction could also influence the precision of perceptual uncertainty, as the precision is estimated by the inverse variance of any sensory input (i.e., prior distribution) (Koelsch et al., 2019[

However, the unique feature of predictive processing and statistical learning in ASD may not always result in negative outcomes but could have positive effects in certain situations. Several studies have reported that individuals with ASD sometimes exhibit superiority in certain abilities (Boucher et al., 2012[

Thus, atypical brain development may display specific characteristics (rather than decay or facilitation) of predictive processing. It is assumed that these specificities of predictive processing, that is hypo-/hyper and hypo-/ hyper-priors sensitivities, could impact statistical learning ability and (statistical) creativity.

A previous study has found that individuals with ASD were able to come up with more unconventional and uncertain ideas during divergent thinking tasks compared to typically developed individuals. However, the total number of ideas generated by individuals with ASD was fewer than that of typically developed individuals (Best et al., 2015[

Neural evidence partially supports this finding and explains it by the hypo-connectivity between the prefrontal cortex and other regions in brains of individuals with ASD (Belmonte et al., 2004[

Previous studies have shown that neural entrainment induced by statistical learning is enhanced when the prefrontal cortex is temporarily disrupted using repetitive transcranial magnetic stimulation (rTMS). This suggests that the temporary disruption in prefrontal cortex function may have caused a hypo-prior or hyper-sensitive state in the brain, potentially resulting in improved statistical learning ability. Our simulation experiments have also shown that hyper-sensitivity leads to improved statistical learning ability from all aspects of reduction of prediction error, increase of chunk, and 1-3 Hz rhythm acquisition, thereby supporting the findings of these previous studies.

However, it is important to note that our simulation only controlled sensitivity (bottom-up processing), not prior (top-down processing), and the models repeatedly learned the same music. Therefore, in the hyper-sensitive model, the reliability of the internal model inevitably increases due to the repeated learning of the same information. This means that hyper-sensitivity “during learning” could lead to a kind of hyper-prior “during production”. Future study needs to investigate how the efficiency in learning (perception) and novelty in creativity (production) are affected when learning various types of information or when controlling for both sensitivity and prior.

In summary, atypical alterations in prior prediction may display specific cognitive individuality involved in perception and production (or learning and creation) through statistical learning. However, such an individuality may not necessarily be favored over the other, as the efficiency of learning and the ease of creating new information may be partially in a trade-off. This study suggests that simulation experiments using statistical learning may lead to a better understanding of the relationship between learning efficiency and creativity in learning systems that exhibit different levels of dependence on sensory signals. Further research on the cognitive individuality may illuminate the potential diversity in human society.

This study suggests that hyper-sensitivity allows for efficient statistical learning of information, but makes it difficult to generate new information, while hypo-sensitivity makes it difficult to learn statistically, but may make it easier to generate new information. Different individual characteristics may not necessarily be favored over the other, as the efficiency of learning and the ease of generating new information may be partially in a trade-off. This study has the potential to shed light on the underlying factors contributing to the heterogeneous nature of the supposedly innate ability of statistical learning that all individuals possess, as well as the paradoxical phenomenon in which individuals with certain cognitive traits that impede specific types of perceptual abilities exhibit superior performance in creative contexts.

This research was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI (22KK0157; 21H05063; 22H05210; 22KK0157), and the Japan Science and Technology Agency (JST) Moonshot Goal 9 (JPMJMS2296), Japan. The funding sources had no role in the decision to publish or prepare the manuscript.

The authors declare no competing financial interests.

T.D. conceived the method of experiment and data analyses. T.D. analyzed the data and prepared the figures. K.K. and M.M. surveyed previous literature and compiled it into a table. T.D. wrote the original draft of the manuscript. K.K. and M.M. reviewed and edited the manuscript. All authors finalized the manuscript.

The scripts for the computational model (Hierarchical Bayesian Statistical Learning: HBSL) and analysis and all data including results have been deposited to an external source (