Is there a way to turn the row headers into data values in your data table? i pulled data in and where you have 1,2,3 etc. my dataset has dates but i would like it to become its own column like num1 or num2. is that possible? new to r so anything helps!
@StatisticsGlobe14 сағат бұрын
Hey, are you looking for this? statisticsglobe.com/convert-row-names-into-column-of-data-frame-in-r
@anitagodwin80672 күн бұрын
This has really helped me Thank you
@tomasbeltran040505 күн бұрын
Is solve better than inv, or is it just my impression?
@StatisticsGlobe14 сағат бұрын
Hey, I haven’t done any direct comparisons, but I believe using solve() for this purpose is standard practice in R.
@dimitriveldkornet67085 күн бұрын
Very well explained. Can I do correspondence analysis using the same package?
@StatisticsGlobe14 сағат бұрын
Thanks! I recommend using the FactoMineR package for correspondence analysis. It’s robust, widely used, and works seamlessly with factoextra for visualizations.
@usmansheikh-v1l7 күн бұрын
I am facing this error. Can you please help in resolving this error? Error in inputm[, cached.keys(k, rel)] : subscript out of bounds Calls: klink2 -> infer -> conn.vector -> conn_vector_C Execution halted
@StatisticsGlobe6 күн бұрын
Hey, could you please share your code and the structure of your data?
@usmanjavedsheikh4956 күн бұрын
@@StatisticsGlobesure can i send you via email?
@StatisticsGlobe5 күн бұрын
It would be great if you could post it here, so that others can read/contribute as well. Thanks!
@usmansheikh-v1l5 күн бұрын
@@StatisticsGlobe this is my input.R file # Prepare dataset as input to Klink-2. data_dir <- "../../data/klink2/" source("relations.R") Rcpp::sourceCpp("utils.cpp") # input relations taken into consideration relations <- c("publication", "author", "venue", "area") # number of relations rn <- length(relations) # length of saved cooccurrence values m <- 100 # 4 input variables to Klink-2: keywordsdb <- new.env(parent = globalenv(), hash = TRUE) reldb_df <- list() reldb_l <- list() inputm <- matrix(, nrow = m, ncol = 0) # reads dataset into global raw input variables: keywordsdb, reldb_df, reldb_l read_dataset <- function(limit = -1, named_list) { file_name <- paste(data_dir, named_list, "/", named_list, ".tsv", sep = "") d <- read.csv( file_name, sep = "\t", header = TRUE, stringsAsFactors = FALSE, nrows = limit * 2 ) empty <- function(elem) is.null(elem) || elem == "" || all(trimws(elem) == "") # need only items with defined keywords fields (DE, ID) w <- c() for (i in 1:dim(d)[1]) { if (!(empty(d$DE[i]) && empty(d$ID[i]))) w <- c(i, w) } d <- d[w, ] if (limit > 0) d <- d[1:limit, ] # fields used: document title, authors, publication name, research areas, year d <- d[c("DE", "TI", "AU", "SO", "SC", "PY")] k <- 1 a <- 1 n <- nrow(d) process_item <- function(item) { cat("process article ", a, " out of ", n, " ") a <<- a + 1 # Process keywords with error handling keywords <- tryCatch({ unique(unlist( sapply(strsplit(tolower(item["DE"]), ";"), function(x) { x <- trimws(x) x[nzchar(x)] # Only keep non-zero length strings }) )) }, error = function(e) character(0)) # Skip if no valid keywords if (length(keywords) == 0 || all(keywords == "")) return() newkeywords <- setdiff(keywords, ls(keywordsdb)) oldkeywords <- setdiff(keywords, newkeywords) # Process other fields with error handling authors <- tryCatch({ res <- unlist(sapply(strsplit(item["AU"], ";"), function(x) trimws(x[nzchar(x)]))) if (is.null(res)) character(0) else as.vector(res) }, error = function(e) character(0)) areas <- tryCatch({ res <- unique(unlist(sapply(strsplit(tolower(item["SC"]), ";"), function(x) trimws(x[nzchar(x)])))) if (is.null(res)) character(0) else res }, error = function(e) character(0)) venues <- if (!empty(item["SO"])) tolower(trimws(item["SO"])) else character(0) # Create relation vectors only for non-empty elements relation <- c(1, rep(2, length(authors)), 3, rep(4, length(areas))) entity <- c(item["TI"], authors, venues, areas) quantity <- rep(NA_integer_, length(entity)) year <- as.numeric(rep(item["PY"], length(entity))) # Process new keywords for (i in seq_along(newkeywords)) { if (nzchar(newkeywords[i])) { # Only process non-empty keywords keywordsdb[[newkeywords[i]]] <<- k reldb_l[[k]] <<- list() reldb_l[[k]]$publication <<- paste(item["TI"], year[1], sep = "_") reldb_l[[k]]$author <<- if(length(authors) > 0) vapply(authors, paste, "", year[1], sep = "_") else character(0) reldb_l[[k]]$venue <<- if(length(venues) > 0) paste(venues, year[1], sep = "_") else character(0) reldb_l[[k]]$area <<- if(length(areas) > 0) vapply(areas, paste, "", year[1], sep = "_") else character(0) names(reldb_l)[k] <<- newkeywords[i] reldb_df[[k]] <<- data.frame(relation, entity, quantity, year, stringsAsFactors = FALSE) names(reldb_df)[k] <<- newkeywords[i] k <<- k + 1 } } # Process existing keywords for (i in seq_along(oldkeywords)) { if (nzchar(oldkeywords[i])) { index <- keywordsdb[[oldkeywords[i]]] reldb_l[[index]]$publication <<- c(reldb_l[[index]]$publication, paste(item["TI"], year[1], sep = "_")) reldb_l[[index]]$author <<- unique(c(reldb_l[[index]]$author, if(length(authors) > 0) vapply(authors, paste, "", year[1], sep = "_") else character(0))) reldb_l[[index]]$venue <<- unique(c(reldb_l[[index]]$venue, if(length(venues) > 0) paste(venues, year[1], sep = "_") else character(0))) reldb_l[[index]]$area <<- unique(c(reldb_l[[index]]$area, if(length(areas) > 0) vapply(areas, paste, "", year[1], sep = "_") else character(0))) reldb_df[[index]] <<- rbind(reldb_df[[index]], data.frame(relation, entity, quantity, year, stringsAsFactors = FALSE)) } } } apply(d, 1, process_item) # Sort the fields in reldb_l for (i in 1:length(reldb_l)) { if (!is.null(reldb_l[[i]]$publication)) reldb_l[[i]]$publication <<- sort(reldb_l[[i]]$publication) if (!is.null(reldb_l[[i]]$author)) reldb_l[[i]]$author <<- sort(reldb_l[[i]]$author) if (!is.null(reldb_l[[i]]$venue)) reldb_l[[i]]$venue <<- sort(reldb_l[[i]]$venue) if (!is.null(reldb_l[[i]]$area)) reldb_l[[i]]$area <<- sort(reldb_l[[i]]$area) } } # Rest of the functions remain unchanged entities_range <- function(reldb_l) { a <- Inf b <- 0 for (i in 1:length(names(reldb_l))) { for (r in 1:rn) { t <- length(reldb_l[[i]][[r]]) if (t < a) a <- t if (t > b) b <- t } } c(a, b) } cache_cooccurrence <- function() { n <- length(ls(keywordsdb)) inputm <- matrix(0, nrow = m, ncol = n * 2 * rn) maxsize <- entities_range(reldb_l)[2] for (i in 1:n) { cat("process keyword ", i, " out of ", n, " ") irel <- reldb_l[[i]] for (r in 1:rn) { co_m <- calc_cooccurrence_C(n, m, i, r, reldb_l, maxsize) inputm[, cached.keys(i, r)] <- co_m[, 1] inputm[, cached.values(i, r)] <- co_m[, 2] } } inputm } run_all <- function(limit = -1, named_list) { keywordsdb <<- new.env(parent = globalenv(), hash = TRUE) reldb_df <<- list() reldb_l <<- list() read_dataset(limit, named_list) inputm <<- cache_cooccurrence() if (limit > 0) { fname <- paste(data_dir, named_list, "/", named_list, limit, ".Rdata", sep = "") } else { fname <- paste(data_dir, named_list, "/", named_list, ".Rdata", sep = "") } save("reldb_df", "reldb_l", "keywordsdb", "inputm", file = fname) cat("Input variables saved to", fname, " ") } inspect_dataset <- function(filename) { file_name <- paste(data_dir, filename, "/", filename, ".Rdata", sep = "") load(file_name) n <- length(reldb_df) entrange <- entities_range(reldb_l) cat("Number of keywords: ", n, " ") #cat("Range of entities associated with a keyword: [", entrange[1], ",", entrange[2], "] ") for (i in 1:rn) { cat( "Value of co-occurrence,", relations[i], "relation: [", min(inputm[, cached.values(1:n, i)]), ",", max(inputm[, cached.values(1:n, i)]), "] " ) } }
@usmansheikh-v1l5 күн бұрын
i got above error on this step¨Run the Klink-2 Scripts¨
@homaanrandm94419 күн бұрын
Hi, I have a dataset what I want to do the descriptive analysis or logistic regression. can you help. if yes then we can talk.
@StatisticsGlobe8 күн бұрын
Hey, I run a Facebook discussion group where people can ask questions about R programming and statistics. Could you post your question there? This way, others can contribute/read as well: facebook.com/groups/statisticsglobe For more detailed, personalized assistance, you may also take a look at our consulting page: statisticsglobe.com/consulting
@123peterjackson11 күн бұрын
question, how would I add multiple labels to an x axis. I have a 13 week trial and the 1st week is a run-in phase, the next 6 weeks are a supplement phase and teh final 6 weeks area washout phase? is that even possible
@StatisticsGlobe9 күн бұрын
Hey, how about this? library(ggplot2) # Example data data <- data.frame( week = 1:13, value = rnorm(13) ) # Custom labels with line breaks ggplot(data, aes(x = week, y = value)) + geom_line() + geom_point() + scale_x_continuous( breaks = 1:13, labels = c("1 Run-in", "2", "3", "4", "5", "6", "7 Supplement", "8", "9", "10", "11", "12", "13 Washout") )
@oscarferrerlozano115513 күн бұрын
one question how can i change the dates of the seasonings? Because it considers that september is autumn but some dates in my counrty is summer
@StatisticsGlobe9 күн бұрын
Hey, you will need to define a custom function for this. The time2season function uses default boundaries for seasons, but it doesn’t allow you to directly modify those defaults. Instead, you can create your own logic for assigning seasons based on your country’s calendar. Here’s how you can do it: # Define example dates my_dates <- as.Date(c("2022-10-01", "2021-05-13", "2025-12-01", "2023-02-17", "2023-06-25", "2022-10-15")) my_dates # Print example dates # Custom function to assign seasons assign_season <- function(dates) { # Extract month and day month <- as.numeric(format(dates, "%m")) day <- as.numeric(format(dates, "%d")) # Assign seasons based on your custom boundaries ifelse((month == 12 & day >= 21) | month %in% c(1, 2) | (month == 3 & day < 21), "Winter", ifelse((month == 3 & day >= 21) | month %in% c(4, 5) | (month == 6 & day < 21), "Spring", ifelse((month == 6 & day >= 21) | month %in% c(7, 8) | (month == 9 & day < 21), "Summer", "Autumn"))) } # Apply the custom function my_seasons <- assign_season(my_dates) my_seasons # Print custom seasons
@Getalew13 күн бұрын
Wonderful! Just a curiosity, are we going to test the assumption of ANOVA before we do the analysis or after as you did here in your video? Thank you
@StatisticsGlobe9 күн бұрын
Thanks! Both approaches are possible. You can test the assumptions either before or after performing ANOVA, as long as you ensure that the assumption tests are completed and any violations addressed before interpreting or sharing your results.
@Getalew13 күн бұрын
I really enjoyed your lecture and it was informative. Please also include a comment to explain some codings. Thank you
@StatisticsGlobe9 күн бұрын
Thank you for the kind comment and your feedback!
@poo9poo9ca9choo15 күн бұрын
Good job. This was quite helpful.
@micha.statisticsglobe14 күн бұрын
Thanks a lot for your kind feedback. Glad it helped! 🙂
@esenemrullah15 күн бұрын
Thank you for a quick tutorial, that was quite easy compared to ggplot 2. Subscribed!
@micha.statisticsglobe15 күн бұрын
Thank you very much for your kind feedback! 🙂
@tomspoors76820 күн бұрын
Of course! One uses Sum to count. How silly of me! Thanks, Joachim
@StatisticsGlobe19 күн бұрын
You are welcome, glad it was helpful!
@trilisser21 күн бұрын
Bro this looks like vaginas
@warcoder22 күн бұрын
Hi! How I can do this with multiple curves?
@StatisticsGlobe21 күн бұрын
Hey, one solution could be to call geom_area multiple times.
@sodaerynzyrillg.729523 күн бұрын
you are saving my group research paper right now thank you so much <3
@micha.statisticsglobe22 күн бұрын
You're most welcome. Glad it helped! 🙂
@KarolKarasiewicz24 күн бұрын
Patchwork and ggstats are the most useful in my opinion, but gganimate looks awesome.
@StatisticsGlobe23 күн бұрын
Thanks for sharing your insights!
@noureddineabid816725 күн бұрын
I like your videos they are short and summarise the relevant information
@micha.statisticsglobe23 күн бұрын
Thank you very much for your kind feedback! 🙂
@WahranRai25 күн бұрын
Good work, keep using the code readable and understandable without using pipe
@StatisticsGlobe25 күн бұрын
Thanks a lot for the kind feedback! :)
@amevordoephelixkelvin366725 күн бұрын
❤
@StatisticsGlobe25 күн бұрын
Thanks, glad you like it!
@danielkwawuvi_tutorialsАй бұрын
Thank you for the nice demonstration
@micha.statisticsglobeАй бұрын
You're most welcome! 🙂
@HuynhCamThaoTrangАй бұрын
Thank you very much for your great explanation. It is extremely helpful for me!!!
@micha.statisticsglobeАй бұрын
Thanks a lot for your kind feedback. Glad it helped! 🙂
@lupen2024-il2vcАй бұрын
Great but if we have a dataframe with many variables with outliers? Should we take "once at a time" aproach to get rid of outliers?
@StatisticsGlobeАй бұрын
Hey, for data frames with many variables containing outliers, it’s best to address outliers carefully, often one variable at a time. Rather than removing them outright, consider retaining all data and applying appropriate statistical methods to handle outliers, as they may hold valuable insights or represent unique cases.
@Gabriel-bw2inАй бұрын
So weird to see python on r studio 😅
@StatisticsGlobeАй бұрын
Haha, indeed that's something you have to get used to.
@eliyas8915Ай бұрын
The above solutions returning an empty dataframe data %>% dplyr::filter(x1 %in% 3:5) [1] x1 x2 x3 y z <0 rows> (or 0-length row.names) instead this one is works for me data[3:5,]
@StatisticsGlobeАй бұрын
That's surprising. I just ran the code again and for me it works fine.
@eliyas8915Ай бұрын
In my opinion, short videos are totally fine but this type of problems needs a detailed explanation
@StatisticsGlobeАй бұрын
Thanks for the feedback. Indeed, this is a complex topic. Let me know if you have any specific questions.
@CarlosErnestoAlvarengaSantosАй бұрын
I liked it. Thanks¡
@micha.statisticsglobeАй бұрын
Thank you very much for your kind feedback! 🙂
@eliyas8915Ай бұрын
Your videos are very helpful for me after your answers i always try to get the same on different methods, like , i have modify the loop a little bit might helpful to the others for(i in 1:3) { data[[paste0("new", i)]] <- rep(i, nrow(data)) # Add new column directly by name } for (i in 1:2) { # Loop to append rows new_row <- rep(i, ncol(data)) # Create a new row with values of `i` data <- rbind(data, new_row) # Append the new row to `data` }
@StatisticsGlobeАй бұрын
Thank you so much for the kind comment and for sharing your code!
@SofiA-nf7osАй бұрын
Thank you so much for this video! you helped me with my thesis
@micha.statisticsglobeАй бұрын
You're most welcome. Glad it helped! 🙂
@peterwestermann5265Ай бұрын
super clear, thank you very much!!
@micha.statisticsglobeАй бұрын
You're most welcome! 🙂
@samthenextgenerationАй бұрын
Thank you for the video! This video was super helpful in explaining the concept.
@micha.statisticsglobeАй бұрын
Thanks a lot for your kind feedback. Glad it helped! 🙂
@khinsoratana248Ай бұрын
I've joined your group in Facebook, and now I have subscribed your KZbin channel. For me, no matter how much change you want in your channel I still enjoy your practical content.
@StatisticsGlobeАй бұрын
Thank you so much for the very kind feedback, this is great to hear!
@AriseDoryneKalembe-ye9fhАй бұрын
thank you so much statistics globe
@micha.statisticsglobeАй бұрын
You're most welcome! 🙂
@eliyas8915Ай бұрын
Very Helpful! Thank you very much! If any one dont want to use qpcR:::cbind.na(vec1, vec2) you can simply transpose the data_rbind as t(bind_rows(vec1,vec2))
@StatisticsGlobeАй бұрын
Thank you for the kind comment and hint!
@KarolKarasiewiczАй бұрын
Hi! Your videos are cool, thank You. Can you do any longer/advanced tutorial on regular expressions? It'd be grat.
@StatisticsGlobeАй бұрын
Hey, thank you so much for the kind comment and topic suggestion. I'll consider it for a future video.
@IsobelFrenchАй бұрын
How do I add grid to a nice_violin plot? Thank you!
@StatisticsGlobeАй бұрын
Hey, could you share your code?
@rashawnhoward564Ай бұрын
Why import a package for python and not R? system.time(dqrng::dqrnorm(100000000)) does this in about 1.3 secs
@CarlosErnestoAlvarengaSantosАй бұрын
Many thanks.
@micha.statisticsglobeАй бұрын
You're most welcome! 🙂
@anindabhowmik5688Ай бұрын
how can i add the dataset of 24 at the first row? plz help
@StatisticsGlobeАй бұрын
Hey, you may use the following code: rbind(new_data, old_data)
@leveluptennis5440Ай бұрын
Excellent overview!
@micha.statisticsglobeАй бұрын
Thank you very much for your kind feedback! 🙂
@eliyas8915Ай бұрын
it was helpful thanks but we dont need to specify the +-m inside the pipe you can directly do the operation as.Date("2017-05-11") + months(1) [1] "2017-06-11" this one is very straight forward
@StatisticsGlobeАй бұрын
Thank you for the kind comment and the tip! It appears this method works as well. I'm not sure why I did it differently in the video, but perhaps this functionality was added more recently. The video is already over three years old.
@zeruyimer3764Ай бұрын
Thanks more, Statistics Globe, for sharing us the video and hopefully you will come with another one.
@micha.statisticsglobeАй бұрын
You're most welcome! 🙂
@darakhshannehal1828Ай бұрын
Glimpse() 🙌🏼. Nice quiz series, keep it up!
@StatisticsGlobeАй бұрын
Thanks, glad you like it! :)
@nigussiekefelegn6764Ай бұрын
I think it is better to change my question as follow! How can we do association analysis between discrete (nominal or categorical variable) and continuous (quantitative) variable using R? can you do one video on it? My second question: can we do association analysis between two categorical data using R? which model can be used?
@gopaltiwarifulАй бұрын
when in performed this code with my data R showing "Error: unexpected invalid token in "my_pca" this any suggestion?
@StatisticsGlobeАй бұрын
Hey, did you run the code exactly as demonstrated in the video?
@gopaltiwarifulАй бұрын
@@StatisticsGlobe Yes as it is
@StatisticsGlobeАй бұрын
That's weird, to be honest, I don't know why this is happening. On my side, everything works fine.