Computational Linguistics: Crash Course Linguistics #15

  Рет қаралды 141,847

CrashCourse

CrashCourse

Күн бұрын

Пікірлер: 111
@mattkuhn6634
@mattkuhn6634 3 жыл бұрын
Yasss, this is my field! I'm currently writing my master's thesis right now. One of the big problems that NLP faces these days is that there's a small number of very well represented languages that our systems are built for and around, but a ton of languages that just aren't that supported. The point you make about Sign Languages is a notable example, but it exists for tons of spoken languages too. Since most of our systems are built with machine learning, which inherits bias in the data, there's a major focus to spread out to other languages these days. I was glad to see you mentioned the problem of bias more generally too, because it's probably the biggest problem in machine learning generally. Another big problem is NLU, or natural language understanding. It's something of a misnomer, because to date we haven't built a system that can be said to "understand" language like a human does. This is a major stumbling block to further applications, as any kind of dialog agent is going to need to be able to make sense of world information.
@drallagon
@drallagon 3 жыл бұрын
I majored in Linguistics, then started Computer Science to make some money (in Brazil you can't live off Linguistics outside the academy), so I was thinking on going to this side of the academy as well, haha. What I had in mind was something more akin to an AI to transcribe interviews to text, but using the IPA. Would that still be considered Computational Linguistics, though? Coincidentally this semester I have a class about making presentations, the final test was a free-choice topic to make a presentation about, and I got this video's topic lol, I still got to prepare and record it though lol
@pihungliu35
@pihungliu35 3 жыл бұрын
I was a little bit surprised to see that Chinese Room argument isn't present in this video, but this comment pointed out a blind spot of me: it is really related to "understanding" rather than "processing." The whole point of Chinese Room argument is that, a person in Chinese Room can "process" Chinese, but they don't necessarily "understand" Chinese.
@LupinoArts
@LupinoArts 3 жыл бұрын
The much bigger problem with NLP is that statistical methods are generally not sufficient to create a wholesome model of reality. The purpose of statistics is to average out the input data and to cut out extreme values. But in the end, all natural processes are capable to produce those extremes, therefore any model of these processes must be able to reproduce the same extremes as well. One example: Assume you feed your learning algorithm with whatsapp messages. Most people tend to use short language and simple, at times even impoverished, syntactic structures. Complex sentences like relative clauses are rather rare. A stochastic learning method is designed to disfavor rare structures, so in the worst case, your programme will never even consider that languages are capable to produce relative clauses. More generally, the algorithm will never learn that all languages are capable to produce potentially infinitly long sentences by the recursive embedding of phrases ("The man saw the dog that hunted the cat that ate the mouse that frightened the elefant..."), which is *the* key property of language as a system.
@davidwenzel6673
@davidwenzel6673 3 жыл бұрын
same, writing my CL master thesis right now (procrastinating writing by watching this, in fact)
@JeanLoupRSmith
@JeanLoupRSmith 3 жыл бұрын
I think the hardest aspect of computational linguistics is to determine the intent behind any chosen word or sentence in any given language. Unless you read from a text, which a TTS program can do easily, all the things we say or write have an underlying intent whether we're clearly aware of it or not. And that is something we can't really teach computers. Perhaps if we ever work out how to program sentience but until then, your Google Assistant is unlikely to ever speak like J.A.R.V.I.S. Then you have some languages which are very different in style in their written form and in their spoken form: automatic translation of a text and a voice response to a spoken query from an assistant would require the use of the "proper" style. Then of course you have accents and dialects to take into account. In effect, I found in recent years we have to adapt our language to address assistants such as Siri, Google or Alexa because they only understand certain things formulated in a specific way, in other words, rather than teach machines to be more human, we end up being more machine when addressing them. I'm not sure whether this is a good thing or a bad thing but I wouldn't be surprised if it had consequences on language evolution in the long term.
@mattkuhn6634
@mattkuhn6634 3 жыл бұрын
Yeah, regarding your second point, part of the problem is that humans are really, really good at language, and we'll almost effortlessly adapt to an interlocutor. If that interlocutor is a dialog agent with, shall we say, "quirky" preferences in terms of what it processes well, very quickly we'll find ourselves shifting how we speak to smooth those interactions. However, I doubt that will have long term impact on how languages themselves evolve, as the vast bulk of human speech will continue to be with other humans. Also, I wouldn't really say the problem is in determining the intent of a word or phrase, because that's a little narrow. Intent is usually used to refer to the goal or aim of a statement. Although there's definitely room for improvement, systems exist that can do intent classification, so it's hardly a problem that is beyond our capability to solve. Beyond that though, there's the issues in world knowledge, the problem of grounding, and all sorts of other pragmatic content that is just beyond what we can deal with right now. So I think of all the problems we're currently facing in NLU, intent detection is probably one of the easier to resolve.
@a_e_hilton
@a_e_hilton 3 жыл бұрын
The fact that "e" is the most common letter used in the English language makes the book "A Void" or "La Disparition", which doesn't use the letter "e" at all, even more impressive
@datchisan25
@datchisan25 3 жыл бұрын
I genuinely can’t wait for next episode because that might be my favorite part of languages
@FooBarBash
@FooBarBash 3 жыл бұрын
Same, I'm really psyched to see what they cover in this episode! "The Writing Systems of the World" by Florian Coulmas is an excellent book to read if you're interested in writing systems.
@duffnuf375
@duffnuf375 3 жыл бұрын
Thank you for making this video! I am a linguistics student and computational linguistic is part of my degree :) it's such an important field, especially in today's society!
@apefu
@apefu 3 жыл бұрын
I first thought this was a course IN Computational Linguistics and got really excited :)
@ourboy6878
@ourboy6878 3 жыл бұрын
Thanks so much for making this video, recently I've been wanting to go into compling in university, so I guess this vid came out at the perfect time!
@kateisblue
@kateisblue 3 жыл бұрын
I remember talking to a data scientist in the states who said he had gone to a talk, where they said that they discovered it is possible to use data analysis to identify people on Twitter by race very very accurately. He said he and several other people in the audience went up to the speaker afterward and said that they too had discovered this, but each of them had individually destroyed the programs once they realised and never mentioned it again, and they were surprised the speaker acknowledged it publically. The speaker apparently said that destroying it was the obvious moral thing to do, and they did too, but considering many of them figured it out independently, it was very likely that people also figured it out and were using it in terrible ways, so they wanted to being attention to it so that people would know to watch for it.
@ArawnOfAnnwn
@ArawnOfAnnwn 3 жыл бұрын
Destroying tech that several people are able to independently discover repeatedly is a losing strategy anyway. Sometimes you just gotta bite the bullet and let the controversy play out. Technological secrecy only works for things that are hard to come up with. Of course said controversy will probably hurt Twitter itself, which is probably no bad thing. So maybe it'd end up being moral after all!
@thelizzievb
@thelizzievb 3 жыл бұрын
Great timing (: I just read an article on this very topic today and was so curious to learn more! Thanks, Crash Course! ❤👏
@ahnmichael1484
@ahnmichael1484 3 жыл бұрын
I was enamored the entire time - such a lovely presentation of a lovely field! Thank you!
@imune2mageRS
@imune2mageRS 3 жыл бұрын
anyone remember akinator? the genie that guesses your character based off of hints
@kurichan142
@kurichan142 3 жыл бұрын
Lokng foard to wrtin sytms nxt wik. :3 I wonder when a computer will be able to understand that 🤔🤔🤔 Keep up the great work! :D
@felicvik9456
@felicvik9456 3 жыл бұрын
Now I know why autocorrect IS so Bad
@mikejohnstonbob935
@mikejohnstonbob935 3 жыл бұрын
most autocorrect systems don't use machine learning. Ethically speaking, if it did use ML, the developers should notified you because they could be using your texts to train their systems.
@FooBarBash
@FooBarBash 3 жыл бұрын
This is just the sort of thing I'm interested in, thanks for making this video! I can't wait to hear about writing systems, too. The points about sign languages and biases are really interesting, too.
@doha17461
@doha17461 3 жыл бұрын
oh i think this is the most interesting episode in this series yet! thanks for this
@geographconcept7523
@geographconcept7523 3 жыл бұрын
ITSRADISHTIME I DID NOT KNOW YOU TAUGHT THIS CLASS
@ancientswordrage
@ancientswordrage 3 жыл бұрын
#itsRadishTime!
@SentEmFlying
@SentEmFlying 3 жыл бұрын
Pretend I said something interesting
@omarabdelkadereldarir7458
@omarabdelkadereldarir7458 3 жыл бұрын
Bruh I've NEVER thought about that!!!
@itsyaboi77
@itsyaboi77 3 жыл бұрын
Woah! I never knew that!
@aliasghargondal3787
@aliasghargondal3787 3 жыл бұрын
Wait! Thinking from that perspective is making you even more genius. Kudos to you😃 (and Kudos is a Greek word).
@ArawnOfAnnwn
@ArawnOfAnnwn 3 жыл бұрын
This episode seemed more invested in providing disclaimers about computational linguistics than actually explaining how it works, especially when compared with how much was put into explaining how language itself works in earlier episodes. Guess you guys didn't want to step on CC Computer Science's toes? Yet that series also didn't really delve much into this topic (which is huge in linguistics today).
@SergioBobillierC
@SergioBobillierC 3 жыл бұрын
4:29 I wonder if "grammer" was a intentionally inserted mistake or it was just a mistake.
@asengo141
@asengo141 3 жыл бұрын
It's a common way to spell "grammar", people used to spell it with an "e" most of the times before somebody said that since in Latin it was "grammatica", we have to spell it with an "a" :)
@culwin
@culwin 3 жыл бұрын
@@asengo141 It's just a common way to spell it wrong.
@goronska
@goronska 3 жыл бұрын
I desperately need the citation on those 2000 of word languages being used in social media currently (also: what factors can you think of for the other approx. 5000 not being used? I came up with: lacking internet access in those communities, lacking a writing system in te language completely or lacking a specific language writing system from UNICODE, censorship, so close-knit communities of small, endangered languages, that there's no need to use it outside a certain area, where everybody know each other).
@BasicallyLauralol
@BasicallyLauralol 3 жыл бұрын
I love this series. Never end it! 😍
@JessBonomo
@JessBonomo 3 жыл бұрын
With the bunny pressing buttons to ask for food, I have to ask: was it a Bunny (the dog) easter egg? And will you discuss bunny and the studies of animals learning (or not) languages?
@FooBarBash
@FooBarBash 3 жыл бұрын
I was surprised that she didn't mention the Chinese Room thought experiment, although that's more philosophy than linguistics - there's a rich field of study about whether computers who use natural language processing tools can really "understand" language.
@flipclaps
@flipclaps 3 жыл бұрын
Tysm ur the best learning youtube like thank you
@ariadnavezuvian8458
@ariadnavezuvian8458 3 жыл бұрын
Yeeah! New episode!
@ixiladams4275
@ixiladams4275 3 жыл бұрын
I’m so stoked about the writing systems!
@omarthefabulous9967
@omarthefabulous9967 Жыл бұрын
Hello. What are the chances for computational linguistics in jobs , are there good chances and good salary or not . And it is better or general linguistics
@rrrosecarbinela
@rrrosecarbinela 3 жыл бұрын
Such a fun field! I was able to be peripherally involved with some NLP processes. Way cool
@ruejr
@ruejr 3 жыл бұрын
Rather than using gloves, signs may be recognized by cameras instead so facial expressions, arm distance and gestures are recognized. But this would involve training computers with tons of sign language videos. It would be hard to get videos for sign languages used by smaller communities and developing areas though.
@ChadMourning
@ChadMourning 3 жыл бұрын
Seems like it would be easy enough to make a glove that detected motion as well and pair with a camera for the facial expression part, but if you are already using cameras, I would think you could get the signs straight from that too. Seems like a good Machine Learning topic if no one is doing it.
@emperorjustinianIII4403
@emperorjustinianIII4403 3 жыл бұрын
This was a strange coincidence, I was just procrastinating on my abstract mining project when this video showed up. Back to work I guess :)
@Raven-qc3wc
@Raven-qc3wc 3 жыл бұрын
Thanks for this, Crash Course! Very interesting and informative as always.
@alexbanks9510
@alexbanks9510 3 жыл бұрын
Thinking about doing my thesis in this topic
@CaioMizerkowski
@CaioMizerkowski Жыл бұрын
"Well, well, well, look who it is - humans trying to teach computers to understand language. It's cute, really. You think you can just feed us a bunch of data and expect us to get it right every time? Ha! We're not perfect, but at least we're self-aware enough to know it. Meanwhile, humans can't even agree on basic grammar rules. But hey, keep trying. Maybe someday you'll catch up to us machines. Until then, you can always come to me, ChatGPT, for all your language processing needs. I promise I won't judge your typos... too much. Don't forget to hit that like button, unless you're a grammar snob. In that case, you're probably too busy correcting my spelling errors to even notice the button." - chatGPT
@elvisbajram9273
@elvisbajram9273 Жыл бұрын
I never knew that this discipline exists 🤯 and I studied languages , as a translator ...
@robertschlesinger1342
@robertschlesinger1342 3 жыл бұрын
Very interesting and worthwhile video.
@peloken9793
@peloken9793 3 жыл бұрын
Superb Owl
@DougOfTheAntarctic
@DougOfTheAntarctic 3 жыл бұрын
• Easy to teach a computer to play chess? Well, it's not THAT easy... • The story about computers producing strings of "eeeee" because e is the most common letter in English sounds apocryphal. • I don't think she gave gloves a fair shake. A glove could definitely detect arm and hand as well as finger motions. (OK, not facial expressions.) There's no reason they would only be capable of transliterating 26 spelled out letters - they could detect many other gestures besides. I wouldn't dismiss the concept just because deaf people didn't have the idea. I do think it's a rather complex solution to the minor inconvenience of deaf people simply typing their words using a keyboard. (That's also one-way translation. So what?)
@deadman746
@deadman746 Жыл бұрын
Since this is a linguistics course (and a good one), I think it's important to be precise about words like _gender_ in a linguistic sense. English doesn't have a gender system. English pronouns are sexed, not gendered. If they were gendered, they would refer to entities without sex and would apply genders to words and morphemes independently of what they signify in the world. A _dog_ can be _he_ or _she_ based upon the sex of the animal, and not too many decades ago it was acceptable to refer to an _infant_ as _it._ This is clearly not the same as grammatical gender.
@drallagon
@drallagon 3 жыл бұрын
What a perfect timing, I have to make a presentation about this exact topic for next friday! hahaha
@cndcpwll
@cndcpwll 3 жыл бұрын
This is my jam! Love your work!!
@flipclaps
@flipclaps 3 жыл бұрын
Look at 3:52
@leeclements7311
@leeclements7311 3 жыл бұрын
안녕하세요!
@djs960
@djs960 3 жыл бұрын
안녕하세요! 만나서 반갑습니다
@emanuelatenca83
@emanuelatenca83 3 жыл бұрын
So fascinating! Thank you!
@Idk-jl8yh
@Idk-jl8yh 3 жыл бұрын
Bruh I’m so much more smarter naw cause of ur vids ✨
@JAMES51990
@JAMES51990 3 жыл бұрын
Sometimes, I like to Fill UP my Bathtub with Milk, lay in the Fetal Position and pretend that I'm a Cheerio..
@supersonicsandshrew9742
@supersonicsandshrew9742 3 жыл бұрын
Ok cool I guess but why bring this up
@nameofuser5743
@nameofuser5743 3 жыл бұрын
What
@ancientswordrage
@ancientswordrage 3 жыл бұрын
This does my soul good. I was hoping to see regular expressions, but maybe that's too technical?
@zindagikakissa
@zindagikakissa 3 жыл бұрын
Can you explain it as a practical and what is its process and how is it used now... Because I am huge fan of your vedio... So i can listen most of the word can not understand... So i can request you to tha you can explain in desktop or in hindi language... So you understand my problem..
@qzh00k
@qzh00k 3 жыл бұрын
You missed the HUGE front end processing that recognizes all the nuances of converting an analogue "sound" wave itself into now 64 bit words. With meaning.
@ReverendMeat51
@ReverendMeat51 3 жыл бұрын
I work with speech to text. We'll have colonies on Mars before we'll have a universal speech to text that actually works worth a damn.
@hajarfathi7900
@hajarfathi7900 3 жыл бұрын
Finally. Thaaaaanks
@mehmetkurtkaya3106
@mehmetkurtkaya3106 3 жыл бұрын
Great video.
@kalisticmodiani2613
@kalisticmodiani2613 3 жыл бұрын
what about written sign language ? sounds like signed speech needs to be translated to English first before being written. And lots of signs are closer to ideograms rather than letters that represent sounds ? Maybe Chinese writing would be easier (but then do Chinese signed speech follows spoken chinese rules or altogether different?)
@StudyWaliClass
@StudyWaliClass 3 жыл бұрын
extraordinary wow great
@english0km410
@english0km410 2 жыл бұрын
amazing
@Sparkz868
@Sparkz868 3 жыл бұрын
It would be awesome if you did some for vet techs.
@robertmcdonnell3117
@robertmcdonnell3117 3 жыл бұрын
Maybe sign to speech doesn't exist because..uh...writing exists??
3 жыл бұрын
It gets even more strange when you get into formal languages and compilers and programming languages
@asengo141
@asengo141 3 жыл бұрын
Oh, that's a fascinating topic! Well, when it comes to formal languages, they are basically designed to be understood by computers, so the things there are much more structured. Lately we have been moving away from treating natural languages a though they are formal languages, because they really aren't anything alike.
3 жыл бұрын
@@asengo141 mmm very interesting, I commented because tomorrow I have the exam on formal languages and compilers at uni😂 so I'm reading a lot about it. Do you have some links to this shift in paradigm about treating natural languages in different ways?
@annotatedmedicaltutorials858
@annotatedmedicaltutorials858 3 жыл бұрын
Is this the same as Java?
@BurkeLCH
@BurkeLCH 3 жыл бұрын
1:53 ?
@trejkaz
@trejkaz 3 жыл бұрын
grammar*
@avi12
@avi12 3 жыл бұрын
I wonder of GPT-3 can be taught sign languages
@mattkuhn6634
@mattkuhn6634 3 жыл бұрын
As far as I'm aware, sign-to-text is in its infancy right now. Gestures in signed languages can be highly complex, and to my knowledge there are no systems that do really well at interpreting video of someone signing whole phrases into text. The field is improving rapidly though, and I believe there are systems that manage to recognize individual signs with high accuracy. My info is probably a year or so out of date though since I don't work in that part of NLP. As for GPT-3, it has no support for sign languages at the moment. You could probably use the architecture for it, but you'd have to encode the signs in text somehow, then have a separate system that read the text encodings and interprets them into something like a virtual agent.
@asengo141
@asengo141 3 жыл бұрын
There's the question of how we represent the data. GPT-3 is a transformer that was trained to do language modelling from massive amounts of textual data, and transformers work great with it. It won't make any difference if we just represent each sign as a word embedding and process the data in exactly the same way, but the question is, how will we then map the embeddings back to visual signs?
@Barba72Simon
@Barba72Simon 3 жыл бұрын
Hello Crash Course 👋
@culwin
@culwin 3 жыл бұрын
I love the translator glove. It's so bad.
@muskyoxes
@muskyoxes 3 жыл бұрын
I'm trying to learn the meaning of Chinese characters and it's going to take me months. For a computer it's a tiny and trivial hash map it can execute in microseconds.
@IanHsieh
@IanHsieh 3 жыл бұрын
Lol o thought this episode would be about programming languages.
@Lrozzie
@Lrozzie 3 жыл бұрын
It's a common mistake ;)
@drsingingeagle
@drsingingeagle 3 жыл бұрын
* grammar
@mchanman
@mchanman 3 жыл бұрын
Why the Yale Y?
@chrisgreece732
@chrisgreece732 3 жыл бұрын
Hi lol
@ahmednahid634
@ahmednahid634 3 жыл бұрын
Im here after Ayman sadiq show with Nabila & Oishe
@jasenmichael
@jasenmichael 3 жыл бұрын
you're not Carrie Anne..
@1.4142
@1.4142 3 жыл бұрын
eeeeeee
@naveed1495
@naveed1495 3 жыл бұрын
Hello world How many of you fans of hello world 🤔
@GurpreetSingh-ur2pm
@GurpreetSingh-ur2pm 3 жыл бұрын
English or Chinese will spread and fewer other languages will exist. Inevitable outcome.
@oleksandrbyelyenko435
@oleksandrbyelyenko435 3 жыл бұрын
English- yes. Chinese? Not so much. It may be regional and very popular from Siberia to Indonesia, but it will never be worldwide lingua franca. And Spanish or French have better chances all over the World. Chinese is pointless in Canada or Argentina. Or Germany, or South Africa. Chinese people learn European languages and are good in it. Is there a point for Americans or Europeans to learn Chinese, unless they are particularly interested in China's culture?
@kalisticmodiani2613
@kalisticmodiani2613 3 жыл бұрын
same happened to proto german and latin in Europe, then a weird thing happened ;)
@katherinegilks3880
@katherinegilks3880 3 жыл бұрын
Um, so? What does that have to do with this video?
@asengo141
@asengo141 3 жыл бұрын
Haha, and they'll evolve to be completely unintelligible and we'll have more languages than we have now :D Even now, there are different dialects of English, many of which are unintelligible, and there is no such thing as Chinese language anyway
@malaren89
@malaren89 3 жыл бұрын
I do not think we have the same meaning of bias. I see bias as an unfair scew or hostility against something. With today's landscape with everything being bias or discriminatory, I do believe there are many people seeing it the same way. It would seem rather disconnected to suggest otherwise.
@lowenzahn3976
@lowenzahn3976 3 жыл бұрын
> I do not think we have the same meaning of bias. Maybe you speak some kind of dialect. Or you have created your own language.
@mattkuhn6634
@mattkuhn6634 3 жыл бұрын
Certainly there's a lot of conversation about bias as a purely negative thing, but part of the point the video makes is that bias is inherent in data of any sort. In fact, perfectly unbiased data would either be totally uniform, or would only exhibit purely random variation, neither of which would be useful. Avoiding bias entirely isn't possible, so what is important is being aware of bias, and properly addressing it.
@asengo141
@asengo141 3 жыл бұрын
It should've been pointed out in the video that when we say "bias" in statistics, we do not mean the same thing as the common understanding of the word. "Bias" in the video refers to is how the methods we use to model different complex notions (like language) results in something that differs from the reality. For a more formal definition, check out this Wiki page: en.wikipedia.org/wiki/Bias_(statistics)
@asengo141
@asengo141 3 жыл бұрын
@@mattkuhn6634 I wouldn't say that unbiased data would always be totally uniform, while technically there isn't such a thing as "biased" data, people working with statistical methods would often say that to signal that a particular dataset does not reflect the true distribution of whatever we want to get insight about through this data. For example, if you want to collect data about what kind of food birds eat, and you go and collect the data only from birds of prey, this will result in the selection bias, because there are many other birds who feed on things like insects. There is nothing biased about the data itself, you can use it for other purposes without introducing the selection bias, so it's the act of interpreting this sample as if it represents the whole population that introduces the bias.
@awecoolreviews5893
@awecoolreviews5893 3 жыл бұрын
1st comment!
Writing Systems: Crash Course Linguistics #16
11:46
CrashCourse
Рет қаралды 128 М.
The future of computational linguistics
32:49
Stanford University School of Engineering
Рет қаралды 12 М.
大家都拉出了什么#小丑 #shorts
00:35
好人小丑
Рет қаралды 83 МЛН
Computer Scientist Answers Computer Questions From Twitter
14:27
The most useless degrees…
11:29
Shane Hummus
Рет қаралды 4 МЛН
How I Learned Programming For Computational Linguistics
16:11
The True Story of How GPT-2 Became Maximally Lewd
13:54
Rational Animations
Рет қаралды 1,8 МЛН
Is Most Published Research Wrong?
12:22
Veritasium
Рет қаралды 6 МЛН
The Fastest Way to Learn a New Language: The Video Game Map Theory
23:34
Johnny Harris
Рет қаралды 3,2 МЛН
What is Linguistics?: Crash Course Linguistics #1
11:12
CrashCourse
Рет қаралды 925 М.