Skip to main content

Parallel Speech Corpus

Equivalent semantic information translated into a variety of languages. The higher the syllable count, the lower the information density. The studies mentioned below use two different means to represent information density:

  1. Calculating the proportion of syllables to a reference language (Vietnamese)
  2. Averaging the Shannon Entropy of each syllable-to-syllable possibility (including the probabilities of being first syllables).

Here, we will just provide the syllable count as the first method used can be intuited by just comparing the counts themselves and the second method is not possible without a significantly large corpus of text to calculate probabilities with.

Source

Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106 (source) and the additional texts provided through this research paper (supplemental materials):

Christophe Coupé et al. ,Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche. Sci. Adv. 5, eaaw2594 (2019). DOI: 10.1126/sciadv.aaw2594

The authors graciously provided some of the data on GitHub.

Passages

The passages for 6 languages are presented below, you can check out the details of 11 more in the paper and supporting materials.

Text O1

Number of Syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
6070112119946449

Text O2

Number of Syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
73861291621357493

Text O3

Number of Syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
73851251531118555

Text O4

Number of Syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
808415515415211089

Text O6

Number of Syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
64671051291037880

Text O8

Number of Syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
69841181591369381

Text O9

Number of Syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
57679883796052

Text P0

Number of syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
10092152156153103102

Text P1

Number of syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
701051151311379677

Text P2

Number of syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
65761301421207981

Text P3

Number of syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
77911521521199590

Text P8

Number of syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
7877133117938175

Text P9

Number of syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
76911511391197761

Text Q0

Number of syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
68881421501267356

Text Q1

Number of syllables:

MandarinEnglishHîsyêôJapaneseSpanishThaiVietnamese
576299126815655