Build a corpus from your own texts/data

  Рет қаралды 37,505

Sketch Engine

Sketch Engine

Күн бұрын

Learn to build a corpus from your own texts and data which you upload to Sketch Engine to receive an annotated (pos-tagged) and lemmatized corpus in many languages.
Quick Start Guide: www.sketchengi...
attend training: www.sketchengi...
supported by the ELEXIS project

Пікірлер
@ibrahimeleissawi
@ibrahimeleissawi 4 жыл бұрын
Hello, do you support the Arabic Language corpus compiling and search? Thanks a lot in advance.
@SketchEngine
@SketchEngine 4 жыл бұрын
Yes, we support Arabic. Please sign up for a trial account and test it auth.sketchengine.eu/#register/form?form=trial
@yuvrajsharma2200
@yuvrajsharma2200 5 ай бұрын
i want to make a corpus of my own voice how would i do it ?
@AlbaBravoZabalza
@AlbaBravoZabalza 5 ай бұрын
I'd already generated a txt document with which create the corpus but I had gotten rid of the stop words and stuff like that. Would it still work here, or is it better if I enter it with the stop words and it will get rid of them?
@SketchEngine
@SketchEngine 5 ай бұрын
Generally speaking, it is better to upload whole texts, including stop words, but it always depends on what is the goal of your analysis. If you weren't sure, please contact us at support@sketchengine.eu with more details.
@yassine-31d
@yassine-31d 7 ай бұрын
should i input a pre processed corpus or does sketch engine pre processes it for me? if sketch engine pre processes, what kind of tasks does it do? thanks.
@SketchEngine
@SketchEngine 7 ай бұрын
Depending on the language, your texts will be tagged and lemmatized. Check the support for languages here: www.sketchengine.eu/corpora-and-languages/
@ChristianHarten1911
@ChristianHarten1911 2 жыл бұрын
Adding PDF's isn't working for me when I try to create a multilingual corpus. The program says it only accepts the following formats: csv, .tmx, .tsv, .xls, .xlsx, .xliff, .zip. What am I doing wrong?
@SketchEngine
@SketchEngine 2 жыл бұрын
You must have clicked on the option to build a multilingual (parallel) corpus. PDF is not supported for this option. PDF is supported when building monolingual corpora as demonstrated in this video.
@ChristianHarten1911
@ChristianHarten1911 2 жыл бұрын
@@SketchEngine Thanks for the answer :)
@Laura_380
@Laura_380 Жыл бұрын
Hello, is one text format better than the other? Is .txt the easiest one to process for Sketch Engine?
@SketchEngine
@SketchEngine Жыл бұрын
Theoretically, txt can be more reliable than doc if the doc contains lots of fancy formatting. But if the doc is simply one column of text, for example a scientific paper or an essay, then there is hardly any difference between txt and doc. However, if you want to be 100% sure what goes into your corpus, then use the txt because Sketch Engine does not have to convert the text into plain text. However, there is rarely any difference as long the formatting is simple. If the doc document contains multiple columns, tables, embedded images, floating text boxes etc., then the conversion to plain text may cause such content to be in different order, change the line breaks or remove some of the content completely.
@michalmravec4284
@michalmravec4284 3 жыл бұрын
What should I do when I need Words only in nominativ or Infinitiv and I found same words in the list more time, but in different cases.
@SketchEngine
@SketchEngine 3 жыл бұрын
The solution depends on the corpus, the language and the result screen. Please produce the result again (even if it contains unwanted items) and from the result screen, contact our support via the "Request help or support" in the very top right of the screen.
@amanmatebie5147
@amanmatebie5147 11 ай бұрын
Can I create a corpus using around 1000 words? I want to analyze my students paragraph writing
@SketchEngine
@SketchEngine 11 ай бұрын
You can build a corpus of any size. Minimum is 1 word, maximum is unlimited. However, 1000 words is far too little for Sketch Engine to provide any usable results. Corpus tools are only useful if they are used to analyse large quantities of text. They are mainly intended for quantities which would be unrealistic to read and analyse manually.
@amanmatebie5147
@amanmatebie5147 11 ай бұрын
@@SketchEngine Thanks
@behnambazmi2521
@behnambazmi2521 2 жыл бұрын
Hello, am studying use of emoji corpus, can it also recognise emojis?
@SketchEngine
@SketchEngine 2 жыл бұрын
Hi, you can directly copy an emoji and use it in a Concordance search. You can copy emojis from here getemoji.com/ And since emojis are tagged as SYM (=Symbol), you can use this CQL query on the Advanced tab of Concordance function [tag="SYM"] (for instance in the English corpora) and then use the Frequency function to get the frequencies of all the symbols that are found.
@charlottepotts814
@charlottepotts814 6 жыл бұрын
Hi, if for example you have a corpus that is sub-divided into different sections, are you able to analyse the whole corpus and the individual sub sections in order to compare them?
@SketchEngine
@SketchEngine 6 жыл бұрын
Yes, this is indeed possible. A corpus can be divided into parts called subcorpora and they can be analysed separately or contrasted. You can even divide the corpora we created into your own parts if you wish. Heare are more details www.sketchengine.eu/user-guide/user-manual/corpora/create-a-subcorpus/ or contact us via support@sketchengine.eu
@carlosdanielmoralestorres8542
@carlosdanielmoralestorres8542 3 жыл бұрын
Excuse me, I trying to create a synonym translator, I just created my own corpus at sketch Engine, but can I upload it as a ZIP format to submit in my algorithm and try on it?
@SketchEngine
@SketchEngine 3 жыл бұрын
Yes, you can upload ZIP files to Sketch Engine. You can also export your corpus from Sketch Engine as plain text or vertical file. www.sketchengine.eu/guide/download-a-corpus/
@nicolyao5072
@nicolyao5072 4 жыл бұрын
with pdf and txt,how can i use corpus to analyze information.for example,i have a speech,now i wanna know the number of entire words and the content.
@SketchEngine
@SketchEngine 4 жыл бұрын
To get the number of words and there frequencies, use the wordlist tool: kzbin.info/www/bejne/pKLTdHx_eKh4mtk
@oDinarte
@oDinarte 3 жыл бұрын
Hi! How to do a corpus analysis of a single newsletter?
@SketchEngine
@SketchEngine 3 жыл бұрын
This is no different to analysing multiple newsletters or a huge corpus. The tools and procedures are exactly the same. Please look at our other videos or here www.sketchengine.eu/what-can-sketch-engine-do/ and here www.sketchengine.eu/quick-start-guide/ to see what Sketch Engine can do. Maybe if you have a more specific question, we can provide a more relevant answer.
@oDinarte
@oDinarte 3 жыл бұрын
@@SketchEngine well, i have to do the corpus analysis of a newsletter (only 1) for the course of Applied Linguistics, but i don't know if it is possible to use a program for analysing such a small piece of information, or if i can do it manually...and how to do it manually? Because a newsletter is not only about text, but my topic is "corpus analysis of a newsletter".
@SketchEngine
@SketchEngine 3 жыл бұрын
@@oDinarte There is no limit to the size of text you can upload. So yes, you can analyse a newsletter in Sketch Engine no matter how short it is. We can help with using Sketch Engine, we cannot help with decisions as to what you should analyse. These are decisions which you have to take first. When you know what linguistic phenomena you want to analyse and what you want to achieve or show, we can point you to the relevant Sketch Engine functionality to achieve that.
@oDinarte
@oDinarte 3 жыл бұрын
@@SketchEngine thank you! That was helpful. Have a great day (:
@Ele-yz3nh
@Ele-yz3nh 4 жыл бұрын
Is there a guide to analize a corpus with sketchengine?
@SketchEngine
@SketchEngine 4 жыл бұрын
The best way to start is to watch the videos on our channel. Alternatively, you can visit www.sketchengine.eu/quick-start-guide/ and www.sketchengine.eu/guide/
@Jerry-um6iy
@Jerry-um6iy 3 жыл бұрын
Can I download the corpus file?
@SketchEngine
@SketchEngine 3 жыл бұрын
Yes, you can. Here is how: www.sketchengine.eu/guide/download-a-corpus/
@Jerry-um6iy
@Jerry-um6iy 3 жыл бұрын
@@SketchEngine Thanks!!!
@dipsikhaphukan5563
@dipsikhaphukan5563 4 жыл бұрын
Do you support Assamese language?
@SketchEngine
@SketchEngine 4 жыл бұрын
Yes, of course! Sketch Engine is language independent. Any language will work. The only limitation is the absence of part-of-speech tagging and lemmatization for Assamese. The functionality will sometimes be limited as detailed here: www.sketchengine.eu/corpora-and-languages/unsupported-language/
@amazingvideoswithyasser9574
@amazingvideoswithyasser9574 Жыл бұрын
Is it free?
@SketchEngine
@SketchEngine Жыл бұрын
You can start with a free trial account auth.sketchengine.eu/#register/form?form=trial and then continue with a paid subscription: www.sketchengine.eu/price-list/
Frequency wordlists in many languages
2:40
Sketch Engine
Рет қаралды 14 М.
Build a corpus from the web
5:25
Sketch Engine
Рет қаралды 32 М.
Beat Ronaldo, Win $1,000,000
22:45
MrBeast
Рет қаралды 158 МЛН
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 30 МЛН
人是不能做到吗?#火影忍者 #家人  #佐助
00:20
火影忍者一家
Рет қаралды 20 МЛН
CQL 1: Complex corpus searches with the Corpus Query Language
6:24
Sketch Engine
Рет қаралды 30 М.
Word sketch - analyse collocations in a corpus
4:01
Sketch Engine
Рет қаралды 29 М.
Concordance for beginners
2:33
Sketch Engine
Рет қаралды 34 М.
Concordance for advanced users
5:15
Sketch Engine
Рет қаралды 12 М.
Concordance - change how you see the results
3:47
Sketch Engine
Рет қаралды 5 М.
Build a subcorpus from a concordance
3:58
Sketch Engine
Рет қаралды 2,6 М.
Google’s Quantum Chip: Did We Just Tap Into Parallel Universes?
9:34
How to build a corpus (text formats)
3:21
CorpusLingAnalysis
Рет қаралды 17 М.
Creating your own, specialised corpus on english-corpora.org
9:47