Reliability 4: Cohen's Kappa and inter-rater agreement

Рет қаралды 61,477

Күн бұрын

Пікірлер: 84

@qiaoxiawu3773 3 жыл бұрын

I have watched three videos on Cohen's Kappa this morning, and this is the clearest and most helpful to me, very easy to follow. Thank you for making this video.

@subhrajitlahiri 4 жыл бұрын

On of the best explanations of Cohen's Kappa so far!

@soehartosoeharto8471 4 жыл бұрын

Thanks prof Vahid. I will wait for interrater reliability using Facet as well. your videos help me a lot

@VahidAryadoust 4 жыл бұрын

Sure, Semesta; I am working on it :-)

@WangyinLi-j4b 11 ай бұрын

Excellent video! very clear!!

@Fentosecava 3 жыл бұрын

Thanks. clear, simple, and to the point explanations.

@shaileshjaiswal3434 4 жыл бұрын

Thanks a ton for making this video, Sir!

@SPORTSCIENCEps 3 жыл бұрын

Thank you for uploading it!

@밍찡-e8q 3 жыл бұрын

Very Very helpful! Thanks Dr. :)

@rahilaanwar8513 3 жыл бұрын

very well explained! Such a useful link! Thank you

@surangadassanayake2088 4 жыл бұрын

A very useful explanation

@selamatsave3745 3 жыл бұрын

Thank you, Dr for the video explanation. what about if there are 3 raters

@VahidAryadoust 3 жыл бұрын

For that, you should use Fleiss' kappa or many-facet Rasch measurement. Please watch: kzbin.info/www/bejne/eKvPlqqLbsdnlaM

@stellazhai3587 3 жыл бұрын

Hello Dr. Aryadoust, thank you very much for your insightful and detailed explanation on this. I have a question regarding the inter-rater agreement. If my data has 2 coders and they coded a bunch of sentences collected from research subjects and coders respectively categorized those sentences either as a positive expression or a negative expression. If that's the case. what would be a good way to check the inter-rater agreement? Thank you for your answer.

@VahidAryadoust 3 жыл бұрын

Agreement is always good to check. if you want to get a bird's eye view of agreement in all the ratings, use Fleiss's Kappa.

@Yu_SeniorFitness 3 жыл бұрын

Thank you for the video, it really helpful! May I ask a question? Two raters are assessing 20 papers, and giving scores 1 and 2. But, Rater B only has scored 1 in total papers. Is applicable use kappa? (Is not the way by 2*2) Thanks again!

@VahidAryadoust 3 жыл бұрын

I suggest you run a simple agreement analysis which is discussed in this video series.

@Yu_SeniorFitness 3 жыл бұрын

@@VahidAryadoust Thanks a lot!

@lellazahra5620 Жыл бұрын

Thank you for your explanation, can you maybe explain what the difference is between the kappa via Crosstabs and the kappa via scale>weighted kappa analysis. Right now I am doing research on two assessors and I want to perform bootstrap a bootstrap method as well. Do you know how to interpret the output if you use bootstrap method for kappa via Crosstabs.

@chokriimed8419 Жыл бұрын

Thank you for the explanation .. If we have more than 2 raters? (Exempl : 10 raters) what we should use as realibility test??

@VahidAryadoust Жыл бұрын

In that case, either use many-facets Rasch model or generalizability theory.

@chokriimed8419 Жыл бұрын

@@VahidAryadoust when executing the many-facets Rasch model this error message is displayed : Error F36: All data eliminated as extreme. Is there a solution?

@VahidAryadoust Жыл бұрын

@@chokriimed8419 it seems that there is an error in the data file.

@chokriimed8419 Жыл бұрын

@@VahidAryadoust I've realised that I forgot to make the separation column between the raters and the items... thank you very much for your help.

@MysterieswithPippa 2 жыл бұрын

Does anyone know how you would code when it comes to quotes? I need to do an interreliability on my qualitative analysis - So i need to see if the quotes I am using for theme are right - so I need someone else to decide with me if they agree with the quote matching the theme.

@susunyein7317 4 жыл бұрын

May I ask a question. Please answer me whether kappa is able to use or not if there were more than two raters.

@VahidAryadoust 4 жыл бұрын

Cohen's Kappa is applicable to two-rater scenarios. If you have more than two raters, maybe use simple agreement analysis, which was covered in the video" or many-facet Rasch measurement (MFRM). I will soon do a video on MFRM.

@susunyein7317 4 жыл бұрын

@@VahidAryadoust Thank you so much! Is it similar to Fliess Kappa? I am looking forward for your uploading video on MFRM. with best regards, Sir.

@elik5485 4 жыл бұрын

@@VahidAryadoust I think Krippendorff's alpha (kalpha) will work as well for more than two raters.

@DnyaneshwarPanchaldsp 3 жыл бұрын

nicely explained

@ordinarypeople4391 2 жыл бұрын

i have tried, but the result in the symmetric measure is about ,000 what does it mean, my senior??

@Marcosls2015 3 жыл бұрын

Amazing lectures, thank you so much for sharing all of this! Please, can I make yo ua question? If I want to compare the kappas obtained by readings done by 2 different pairs of readers ("k1" and "k2), please, what would be the statistical test to prove that these k1 and k2 are (or not) statistically different? Again thanks!

@VahidAryadoust 3 жыл бұрын

To my knowledge, I am not sure if this comparison would be possible.

@Marcosls2015 3 жыл бұрын

@@VahidAryadoust Thanks again for your time and sharing your knowledge! Kind regards

@tas1424 3 жыл бұрын

Hi, thank you for this video. I am conducting a risk of bias/quality assessment as part of a systematic review. The assessment considers the quality of five included studies using a list of preset questions, the answers to which are either 'no detail', 'limited detail', and 'good detail', with each category allocated a score of 0, 1, or 2 respectively. There are 18 questions, therefore the maximum score for each paper is 36. There are two raters. My question is whether a kappa score should be obtained for each study individually, then take a mean or provide the range of scores obtained from all five studies, or whether there is a better approach to applying Cohen's kappa? Perhaps there is even a more appropriate test I could conduct?

@VahidAryadoust 3 жыл бұрын

You could: 1) obtain a Kappa score per study and provide their range in your report (mean of kappa does not make a lot of sense) 2) use many-facets Rasch measurement (see the relevant videos in this channel please) 3) use Fleiss's kappa and run all the analyses in one go.

@tas1424 3 жыл бұрын

@@VahidAryadoust that's great, thanks!!

@CCamillaPedersen 3 жыл бұрын

Great, and simple, explanation of Cohen's Kappa, thank you! A question, as it looks as if the reliability test is carried out by calculating the number of times a category is coded (e.g. the code 2), rather than whether the two raters have coded the same item/paper the same (e.g. if both raters have coded paper2 as code 2). If this is the correct understanding, is there any way to test the agreement between two raters by comparing the coding of each item, using SPSS?

@VahidAryadoust 3 жыл бұрын

If I understand the question correctly, you need to compute a simple agreement index, which is explained in the same video.

@shaileshjaiswal3434 4 жыл бұрын

I wonder if you could make a video on calculating Fleiss’ Kappa as well. That would be of great help to this community. Also, there isn't a video on how to do it here. Looking forward. Thank you again. :D

@VahidAryadoust 4 жыл бұрын

It is on my to-do list.

@kristinashatokhina7297 3 жыл бұрын

Thank you for this helpful video! I've been reading the comments and am wondering when you use Cohen's Kappa versus an ICC. In my scenario, I'm running reliability analyses for two studies. In the first study, two coders are reading assessment reports. For each report, the coder assigns a rating between 0 and 3 for 3 items. In the second study, two coders are reading court documents. For each document, they score many items (about 30?) - some of these items are dummy coded (0,1) and others are ordinal, ranging from 1 to 4. Any insight would be greatly appreciated! Thank you!!

@VahidAryadoust 3 жыл бұрын

In both cases, you can run Cohen's Kappa on single items (one at a time).

@kristinashatokhina7297 3 жыл бұрын

@@VahidAryadoust Thanks so much!

@kristinashatokhina7297 3 жыл бұрын

@@VahidAryadoust Hello again! Do you have a recommendation for how to calculate inter-rater reliability for nominal data? I have two raters and the response options are different locations in no meaningful order (e.g., private residence, social club, etc.). Thank you!

@VahidAryadoust 3 жыл бұрын

@@kristinashatokhina7297 Please use simple agreement rates, which are explained in the video.

@kristinashatokhina7297 3 жыл бұрын

@@VahidAryadoust Thank you!!

@nathaliafernandes3000 3 жыл бұрын

Thank you so much!

@nidhijoshi1532 2 жыл бұрын

Hello sir, thank you for this video. I have developed a research questionnaire with a Likert scale rating. I have conducted a test-retest of the developed questionnaire among 74 respondents. How can I assess the reliability of the developed questionnaire using the above method?

@VahidAryadoust 2 жыл бұрын

You can use item correlations in time 1 and 2, as well as the correlation between the composite scores (aggregated scores in time 1 vs time 2).

@DnyaneshwarPanchaldsp 3 жыл бұрын

👌👌

@Az-uj6lr 3 жыл бұрын

thank you for the video! I wanna ask, how to do that with 12 raters?

@VahidAryadoust 3 жыл бұрын

Try Fleiss' kappa.

@jorgemmmmteixeira 3 жыл бұрын

Hi. What about calculating sample size for Kappa? Do you think it is problematic to set the null hypothesis at K=0.0? I believe this would be the same at what others call to set K1=0.0, when many state the K1 should be the minimum K expected. Thanks

@VahidAryadoust 3 жыл бұрын

Sample size: there should be at least 5 observations per level.

@jorgemmmmteixeira 3 жыл бұрын

@@VahidAryadoust Per level is equal to number of responses options or number of questions? In this case, I am talking about the reliability of a questionnaire. I was using this kind of calculators: wnarifin.github.io/ssc/sskappa.html thanks

@VahidAryadoust 3 жыл бұрын

@@jorgemmmmteixeira I am not sure I understand the question. Can you rephrase it?

@jorgemmmmteixeira 3 жыл бұрын

@@VahidAryadoust Sure. I am trying to assess the reliability of a questionnaire and seek to determine the optimal sample size. From what I have read, the key features for that is the minimum and expectable kappa agreement (labelled by K1 and K2 by some authors). My doubt if it is ok to set K1 at 0.0 for power calculations, or since the minimum expetable values are always higher than that, that is not recommended? However, your take on this answer seems different. The levels you mentioned are in regards to the number of responses options? Per question or in totalof questionnaire?

@VahidAryadoust 3 жыл бұрын

@@jorgemmmmteixeira 1. Kappa is not a suitable method for questionnaire validation. 2. the alpha level is to be set at 0.05 in Kappa analysis. 3. for validation of questionnaires, use a different method, like Rasch measurement or factor analysis.

@MrJsanabria 3 жыл бұрын

What would you do if Kappa agreement is too low? Would you ask both coders to go through the codes in order to match to each other and increase the Kappa? Or what do you suggest?

@amir.hazwan 11 ай бұрын

As McHugh (2012) suggested perhaps it's best to calculate both Kappa and percent agreement. Re-training and calibration among coders may be necessary to get better agreement.

@hakanbayezit5908 4 жыл бұрын

Sir, two raters are assessing 80 essays, and giving a score between 0-20. (content: 6, organization: 5, language use: 6, punctuation:3) Can we use the kappa technique you taught above? Please advise.

@VahidAryadoust 4 жыл бұрын

I suggest you use correlations or many-facet Rasch measurement for this scenario. see: kzbin.info/www/bejne/gmjcZYturJdsf5Y kzbin.info/www/bejne/gIeskmugqN92erM

@hakanbayezit5908 4 жыл бұрын

@@VahidAryadoust Sir, do you suggest using correlations to find inter-rater reliability between two raters?

@hakanbayezit5908 4 жыл бұрын

do you mean intra-class correlation coefficient?

@riybad09 3 жыл бұрын

Hello Sir, i have query, in case of more than two reader (assume there are 4) how to calculate

@VahidAryadoust 3 жыл бұрын

You can use Fleiss' kappa.

@ehabyosfy1449 3 жыл бұрын

Thanks a lot prof for this great presentation, I have question how I can interpre the value of kappa if it's measures of agreement is 0.33 and it's approx. Sig is. 000

@VahidAryadoust 3 жыл бұрын

There is 33% of overall agreement between the raters, which is statistically significant. However, despite statistical significance, 0.33 seems to be a low / weak index.

@sportychick69851 3 жыл бұрын

What if Kappa is not significant? How/Could you interpret that?

@VahidAryadoust 3 жыл бұрын

There would be no significant agreement between the reviewers --> they have interpreted the rating scale quite differently --> retraining might be needed.

@joshboston2323 3 жыл бұрын

what does it mean when its says "no statistics are computed because rater 1 is a constant"?

@VahidAryadoust 3 жыл бұрын

The grades / scores assigned by Rater 1 are all the same; for example, s/he gave 2 to all the students. Please also watch this: kzbin.info/www/bejne/rJKQhZ2sm7-Afs0

@joshboston2323 3 жыл бұрын

@@VahidAryadoust right. I got it. thanks for the quick reply!

@madeleineingham1899 4 жыл бұрын

thanks for the video! how would you interpret a negative kappa (eg. -0.09)? thanks!

@VahidAryadoust 4 жыл бұрын

It means there is no agreement between the raters.

@kumkum12 2 жыл бұрын

Can this reliability is used for a checklist for textbook analysis ? Actually i asked two subject expert to analyze 2 chapters of book using particular checklist. But result of twa rater were found different i.e., frequency of each item is not same. Please help me out how to calculate itner rater reliability of my checklist.. It would be very helpful for my research work

@VahidAryadoust 2 жыл бұрын

Based on the description you provided, it seems to be applicable to your scenario, but note that I might be wrong, as I do not know much about your study.

@novawinalda1082 4 жыл бұрын

How about there are 3 raters? Would it be the same way?

@VahidAryadoust 4 жыл бұрын

@Nova Windala Yes, it will be the same, although I would suggest you use many-facet Rasch measurement in that case. Check this out: kzbin.info/www/bejne/lZK9Zat4ntJ_f9E

@olivianicastro5107 6 ай бұрын

what if you have more than 2 raters?

@VahidAryadoust 6 ай бұрын

Use many-facets Rasch measurement.

@olivianicastro5107 6 ай бұрын

@@VahidAryadoust I have 9 content experts and I was reading from Polit and Beck that I have to use a modified kappa to eliminate chance agreement. Is that what you are referring to-I don't know what many-facets Rasch measurement is.

@VahidAryadoust 6 ай бұрын

they may have referred to Fleiss' kappa, but I am not sure. Many facet Rasch measurement (MFRM) is a different way of analyzing data. I have several videos in this channel on MFRM.@@olivianicastro5107