Origins of Language

Origins of Language (Revisited)

“The origin of language, or glottogony/glossogeny, is a topic that has been written about for centuries, but the ephemeral nature of speech means that there is almost no data on which to base conclusions on the subject” (Ying). So, how do we study it? Where did these research-backed hypotheses come from if there’s almost no data? Well, an easy place to start would be the basic anatomy of the human mouth and all the sounds it’s capable of producing. Along with the places of articulation and the sounds, throughout this paper, we’re going to explore the speech development timeline and the various theories of language.


Taking into consideration how exactly speech works, is mind-blowing. We have what most animals don’t, a pharynx (Goodall). The elongated throat of a human is what makes it possible for us to speak. However, you don’t just come out of the womb speaking in full-fledged sentences, you develop it throughout your life.

There are three different types of phonetics (acoustic, articulatory, and auditory), but we’re only going to focus on the articulatory. The articulatory phonetics is the part of phonetics that deal with the sounds that come out of our mouths.


Articulatory phonetics is the “study of how speech sounds are made, or articulated” (Yule). The internal organ that is responsible for the sounds coming out of your mouth is your larynx. Inside of the larynx are your vocal cords, which has two positions: spread apart or drawn together. When your vocal cords are spread apart you can make voiceless sounds and when they’re drawn together you can make voiced sounds by pushing air through your larynx. A way to distinguish which sounds are which is by producing sounds like: S-S-S-S or F-F-F-F (voiceless) and Z-Z-Z-Z or V-V-V-V (voiced). While making these sounds, feel your neck or Adam’s apple. If the noises are voiced, you should feel a vibration and if they aren’t voiced there shouldn’t be any vibrations. (Yule)


After the air comes out of your mouth, it’s important to note that this journey through the throat isn’t finished. While you talk, the air is released and there are parts of your oral cavity that constricts. “If you slice a head right down the middle, you will be able to see the parts of the oral cavity that are crucially involved in speech production” (Yule). These are: the larynx, pharynx, tongue, vocal folds, the uvula, velum, palate, and your lips.

Your lips help you annunciate. For bilabial consonants, both lips are used. Then, for labiodentals, you use your upper teeth and lower lip. Lastly, for alveolar sounds, you use the alveolar ridges behind your upper teeth. Any sound where you use your teeth are called dentals, and any sound where the tip of your tongue is used between your teeth are called interdentals. Plus, any sounds where you use your palate are called palatals, and any sound where you use the back of your mouth are called velars. Each of these terms only apply to consonants because vowels use a freer flow of air.

While pronouncing vowels your mouth is divided into three sections: the front, central, and back. All vowels, along with diphthongs, are produced primarily with our throat and these three sections. The vowels and diphthongs “glide” out of our throats and through our mouths while we say them. A few examples of front, central, and back vowels would be words like: bead, bid, bed, bad, (central): above, oven, butt, blood, and (back): boo, book, born, caught, cot. (Yule) Try saying each of these words slowly and concentrate on where they are coming from. This’ll give you a better understanding of how they are produced.


Before diving more into the realm of language, it’s important to note that this timeline isn’t strict on the ages. The dates in this timeline are approximated as each child is different and learns at their own pace.

“The real engine of verbal communication is the spoken language we acquire as children” (Pinker). Speech starts to develop around the first few months after you are born. From birth, you have the capability to make some sort of noise. The first noise out of a baby’s mouth sounds like a “coo”.  In my opinion, “coo” reminds me of something you would hear from a small bird, not an infant. However, if you put this into perspective – it makes sense.

During the first few months of life, an infant can produce the “k” and “g” sounds. “By the time you’re five months old, you can distinguish “i” from “a” and “ba” from “ga” (Yule). So, producing “coo’s” like a small bird is possible. At about six and eight months, kids start to babble (or what I would like to call Star Wars speak, like the Ewoks). Then at ten and eleven months, they develop the capability to put together syllables like “ma-da-ga-ba” and they attempt to imitate their surroundings. At this point, it’s important to watch whatever you say around kids because of the Ding-Dong theory and the kids’ ability to mimic humans (Johnson).


From twelve to eighteen months, kids move into the “one-word stage” where they can label everyday objects such as milk, cup, spoon, or cat. Also known as holophrastic speech, children can make connections between two words as well (Yule). For example, they can associate spoon and bowl, milk and cookies, and so on.

At eighteen to twenty months, children start to move on to the two-word stage. This is where they start creating phrases. In fact, by the age of two, children can produce “200 or 300 distinct ‘words,’ he or she will be capable of understanding five times as many” (Yule). By the time kids are two to two-and-a-half, they’re capable of a type of complexity called telegraphic speech. Telegraphic speech is when children can develop lexical morphemes and their speech starts to expand more rapidly than before. Their sentence-building starts to improve, and they start to put words together more accurately.


The theories of language were created to find out how humans begin to speak. While there is barely any data to back these theories up, a few scientists have managed to create the following hypotheses. Each theory is unique in its own way, but a few of them you could combine with one another to create one bigger theory.


The Psychedelic Glossolalia Hypothesis elaborates more on speaking in tongues by consuming psychedelic fungi. This theory vaguely reminds me of the term “Parseltongue” from Harry Potter, but in this case, it deals more with the Pentecostal church and various other religions and tribes. Originating in Africa, when their land was dry, and resources were scarce, they consumed the Psilocybe plant which led to complex and unnatural communicative speech. This theory was clearly a reach when it comes to how language came-to-be which is why it is listed first, making it the least practical.


Created by Charles Darwin, the Ta-Ta theory shows how humans imitate hand gestures vocally. (Get ready for this fancy name), “Vilayanur S. Ramachandran’s research into synesthesia and sound symbolism” supports Darwin’s hypothesis (Ying).

However, despite all of Ramachandran’s research, there is still a plethora of questions that remains unanswered. Where did the original hand gestures come from? Even though “sign languages do have somewhat imitative gestures, they also contain quite arbitrary symbols and have vastly different meanings” throughout the world (Ying).

Another issue with Darwin’s theory is that using hand and facial gestures are useless if they’re unseen. If you can’t see someone’s face, how could you imitate what they gestured? Also, if you’re working on another project with your hands, how can you use your hands to demonstrate hand gestures?

This theory, while well thought out, reminds me of Meet the Fockers with Robert DeNiro and Ben Stiller. There is a scene in the movie where DeNiro teaches his grandson Jack to learn signs that mean various things a child should know such as: eat, poop, drink. Unfortunately, while Stiller was babysitting Jack, he didn’t know the meaning behind each sign. What ended up happening was Stiller thought Jack needed something, but the message was misconstrued because Stiller thought Jack signed for food, when he signed for drink. Resulting in a massive tantrum from Jack, DeNiro came into a house with a screaming toddler and a frazzled Stiller and Stiller was accused of not catering to Jack’s needs.

While Stiller could’ve just called DeNiro to ask what signs mean what, he wouldn’t have been able to see DeNiro show him the signs through the phone. Just this example is enough for me to approach this theory as invalid.

3.3 YO-HE-HO

The Yo-He-Ho theory deals more with poetry than anything else. “According to this hypothesis, language arose in rhythmic chants, and vocalisms uttered by people engaged in communal labour” (Ying). This theory still doesn’t have a rightful “owner” and only states that people sing in groups. However, “it’s uncertain from this hypothesis how meanings became associated with songs that were sung by workers” (Ying).


The Uh-Oh theory is like how monkeys use warning calls. “According to this hypothesis language begins with the use of arbitrary symbols that represent warnings to other members of the human band” (Ying). Like how monkeys warn their troop about predators in the area, or even how they warn each other when they overstep their boundaries, humans have different “warning calls” for different things. If your sibling is about to eat the last of your favorite cereal, you might yell at them to save the rest for you instead. If your child is trying to jump off a swing, you might warn them not to or they will break an arm or a leg. This theory seems logical because it does not just include single words or phrases, but it is still uncertain as to how more abstract features of our language has evolved.


According to the Danish linguist, Otto Jesperson, “speech developed from the instinctive sounds people make in emotional circumstances” (Yule 3). Which explains how the phrases “Ouch!”, “Ah!”, “Ooh!”, “Phew!”, “Yuck!”, and “Wow!” came about (Yule 3). However, because this theory limits speech to just expressing emotions, it does not tell us where the other noises came from. “The clicks, intakes of breath, and other noises which are used in this way bear little relationship to the vowels and consonants” (Nordquist). Without vowels and consonants, we do not have a clear concept of speech and where it came from, which makes this theory impractical.


Watch the Birdie is associated with E.H. Sturtevant, a linguist and ethologist from Jacksonville, Illinois. He received his Ph. D. from the University of Chicago and created the Indo-European character of Hittite, which is an extinct language established in Turkey (Edgar Howard). Sturtevant came up with Watch the Birdie because he believed that “humans found selective advantage in being able to deceive other humans,” thus giving them the capability to learn how to react to things happening around them (Ying). For instance, if you are at work and your coworker does something to upset you, you cannot react accordingly because you are in a place of business. There are certain ways to react to things in certain settings and the Watch the Birdie theory is an example of learning to “read the room” before reacting.


The Ding-Dong theory is based off the notation that humans grow to mimic the sounds of the world around them (Ying). This theory deals with onomatopoeic words such as “boom”, “splash”, “rattle”, and more. While infants are learning to speak, the Ding-Dong theory relates more to how their speech is developed in an early age. Copying onomatopoeic words sounds easier to naturally come by, and other languages besides English can abide by this theory as well. However, it does not explain how the words for inanimate objects were created. The rock “splashes” into the river, but how did the word “rock” come about or any of the other prepositions used in this sentence?


The Bow-Wow theory is another theory by the Danish linguist Otto Jesperson. This concept is like the Ding-Dong theory because both theories say that speech is onomatopoeic, but the Bow-Wow theory deals with the animal sounds around them instead of imitating sounds humans make (Ying). However, this theory does not translate well throughout the different languages. “For instance, a dog’s bark is heard as au au in Brazil, ham ham in Albania, and wang, wang in China” (Nordquist). Thus, resulting in another faulty theory.


From the moment we start to speak at a very young age, language has continued to develop. The next time you speak to another being, whether it’s a human or an animal, take into consideration that one of the few things that separate us from the non-speaking chimpanzees is a pharynx. Our places of articulation may seem small, but they’re so powerful. Language will continue to develop without our control. Whether it’s slang or newly documented words, the potential for new vocabulary in the future is intense.

While most of the previous theories don’t have much research to back it, the theories that scientists have uncovered throughout the years seem mostly valid. All it takes is for someone to pick up the hypothesis and try to experiment with it. Hypotheses are supposed to be based around limited evidence so that way whoever picks it up next can experiment further off their statement. However, while this research is interesting, this is not a complete analysis of the subject at-hand and it’ll require a more in-depth study of the origins of languages.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Start a Blog at

Up ↑

%d bloggers like this: