Genomics in practice - Genotype data format change with PLINK

  Рет қаралды 10,386

Genomics Boot Camp

Genomics Boot Camp

Күн бұрын

Пікірлер: 49
@Fasilgetachew
@Fasilgetachew 3 жыл бұрын
Thanks, Professor. Your channel is an excellent support in analyses of my data. Keep up the nice work.
@mdrasheduzzaman7613
@mdrasheduzzaman7613 3 жыл бұрын
Thanks a lot Professor. It helps a lot. I faced a problem. Actually I was using the GWAS PLINK type and plink was not finding the options "--file", "--freq", "--recode", "--out" etc. Using "./plink" instead of "plink" solved the issue. Maybe the piece of info will help others and save a lot of time. :) Thanks a lot, again.
@GenomicsBootCamp
@GenomicsBootCamp 3 жыл бұрын
Thanks! Indeed the "./plink" should be used instead of "plink" on Linux OS, and probably also Mac. The test runs and the video was done in Windows 10
@baharehbehrooziasl9517
@baharehbehrooziasl9517 2 жыл бұрын
Hi, I am running R on Linux system. When I called PLINK, it ran normally. However, when I wanted to apply the first line of code to change the format, I received error message. I tried ".\plink" but it gave me an error. Do you have any suggestion?
@mdrasheduzzaman7613
@mdrasheduzzaman7613 2 жыл бұрын
@@baharehbehrooziasl9517, Hello. I think it is just a typo. It should be a forward slash, not a back slash (back slash is used in windows file system actually). So, try "./plink" with forward slash. Let us know if it solves the issue.
@baharehbehrooziasl9517
@baharehbehrooziasl9517 2 жыл бұрын
@@mdrasheduzzaman7613 Thanks for your prompt response. Sorry for the typo, I actually used "./plink" in the code. But it did not work. It gave me the warning error:"error in running command". However, when I used "plink", it ran normally but it gave me the error " unknown option "--bfile" ( and the same for recode and out).. when I tried to recode the dataset.
@mdrasheduzzaman7613
@mdrasheduzzaman7613 2 жыл бұрын
@@baharehbehrooziasl9517, I think the probable reason is your plink executable file is not in the working directory you set for your R session. So, try resetting the working directory (where your plink file is). Note: Put all your data files and the plink file in the same directory and run the code to see if it works correctly.
@moslemmoghbeli4325
@moslemmoghbeli4325 8 ай бұрын
thank you for all of Video
@seyedhashemi9636
@seyedhashemi9636 3 жыл бұрын
Thanks for such an informative video!
@yawpr3ko837
@yawpr3ko837 Жыл бұрын
How do I change from CSV file to PED or MAP file?
@georgewanjala4605
@georgewanjala4605 3 жыл бұрын
Professor, I would like to know how to use subfolders (sub-directories) in the main directory, i.e. if I have some clustered datasets saved in subfolders and I want plink and R to read from them and save output there directly to avoid jamming data in the main folder. Otherwise, I enjoy following your tutorials repeatedly, they are elaborative and very helpful.
@GenomicsBootCamp
@GenomicsBootCamp 3 жыл бұрын
George!!! Thank you for the commen and the question! Somehow I did not think about this before, but it is a great improvement to data management (will make a video about this and credit you if you agree...) As for the answer: 1) You create a subdirectory, e.g. output - could be anything you want 2) You specify the --out statement as: --out output/outFileName where "outFileName" is the name you would normally state in the --out option 3) The whole thing works also as input, e.g. --file data/inputData where data is the subfolder where inputData.ped and inputData.map are stored 4) I tested it on Windows 10 system, so for Linux (and Mac?) one has to probably use the opposite sleash, i.e. the \
@georgewanjala4605
@georgewanjala4605 3 жыл бұрын
@@GenomicsBootCamp Thankyou so much professor.
@ashvinkumarkatral1978
@ashvinkumarkatral1978 Жыл бұрын
Thank you very much for a handy topic. I am trying to convert VCF file to plink binary files. But I am facing problem while running. I am ending with "Error: Invalid alternate allele on line 23 of --vcf file". I could check the data file for the same and I could not find any error in the data format. Please suggest for the best way out. Thank you very much Sir
@GenomicsBootCamp
@GenomicsBootCamp Жыл бұрын
Hi, I don't know what could be the problem, but my first thought would be to compare that line with e.g. the line 22. That did not show an error, so should be ok. Then look for anything that is different, especially in the alternate allele colum. maybe it is missing or there is a weird sign there? But maybe the probel is elsewhere on the line. Also, for trial I would just manually delete that line from vcf file and see if the problem remains. If still the line 23 is indicated then it might be something around it. If a different line is indicated for the same (e.g. line 100), you can now look at the faulty line 23 and 100, and see what is common in them. Not very scientific approach, but worth a try.
@adamramses9722
@adamramses9722 3 жыл бұрын
Thanks for your videos Professor it's a great help indeed, I would love a video about changing Bed/bid/fam format into raw data format like 23andmefile format so it can be used in analysis like gedmatch and such websites.
@GenomicsBootCamp
@GenomicsBootCamp 3 жыл бұрын
Hi! At some point, I want to make a larger video on all #PLINK input/output files... For now, for your question you may try the line: plink --bfile [binary ped file] --recode 23 --out outputName You need to extend it with --chr-set or similar if you have non-human data. This approach handles one individual at a time. Thinking about it now, this info with some more details might be a decent video... Thanks for the suggestion!
@adamramses9722
@adamramses9722 3 жыл бұрын
@@GenomicsBootCamp Thanks so much for your answer professor would love a video about that would be really helpful, I tried to follow your instructions but i guess am missing smth am trying to convert a bed file but it's has lots of samples and i keep getting an error Error: --recode 23 can only be used on a file with exactly one sample.
@GenomicsBootCamp
@GenomicsBootCamp 3 жыл бұрын
@@adamramses9722 Yes, the --recode 23 seems to work that way. So you need to combine it with a --keep option in the same line, keeping only a single individual at a time. The --keep option is explained in the video "How to select and remove individuals in PLINK" on this channel. So you can go ahead and try this. One additional issue is that if you have many individuals, the manual approach could be tedious, so you need to implement a looped solution, that runs things automatically. I will also try to provide such a solution in a video.
@adamramses9722
@adamramses9722 3 жыл бұрын
@@GenomicsBootCamp Thanks so much for your guidance professor i tried to search about this but sadly didn't find much information regarding this so a video about that would be really helpful
@GenomicsBootCamp
@GenomicsBootCamp 3 жыл бұрын
The video on changing PLINK to 23andMe comes tomorrow. Thanks for the idea!
@AmitabhBiswas
@AmitabhBiswas 2 жыл бұрын
Had an error while running --bfile first command Error: --export requires at least one output format. (Did you forget 'ped' or 'vcf'?)
@GenomicsBootCamp
@GenomicsBootCamp 2 жыл бұрын
The error message seems to point towards the output file, so: 1) check if you have the .bim, .bed. and .fam files in the same directory yourun PLINK, just to be sure 2) perhaps you did not specify what type of output you want, so if you do not have e.g. --recode or --make-bed in the PLINK line, just add it there
@liutrvcyrsui
@liutrvcyrsui 2 жыл бұрын
Thanks for the video. Can PLINK generate Mean Genotype File Format ?
@GenomicsBootCamp
@GenomicsBootCamp 2 жыл бұрын
Hi, To my knowledge not, but you could check the --recode option and prepare the files for an other program whci does. www.cog-genomics.org/plink/1.9/data#recode In particular, the BimBam program seems promising. From the Appendix 1 of its manual: "Imputation without panel. ./bimbam -g input/cohort.txt -p input/pheno.txt -e 10 -s 20 -c 15 -o pref -wmg This command line asks bimbam to run EM 10 times, each EM run 20 steps. After imputation, output mean genotypes." www.haplotype.org/download/bimbam-manual.pdf
@georgewanjala4605
@georgewanjala4605 2 жыл бұрын
@Genomic Boot Camp, would you please advise on how to convert plink files to FASTA file format or Arlequin file format.
@GenomicsBootCamp
@GenomicsBootCamp 2 жыл бұрын
here seems to be an easy way, but you need perl github.com/gungorbudak/ped2fasta
@georgewanjala4605
@georgewanjala4605 2 жыл бұрын
@@GenomicsBootCamp, Thank you professor
@kashifkhan-xr8fj
@kashifkhan-xr8fj Жыл бұрын
Hello sir....Could you please help me how to convert SNP genotypic data txt format into ped and map file?
@GenomicsBootCamp
@GenomicsBootCamp Жыл бұрын
Is it close to any of the PLINK input file formats? See: kzbin.info/www/bejne/kIPcl6ObZt-kjMk If yes, adapt to it and use --recode to get ped+map
@kashifkhan-xr8fj
@kashifkhan-xr8fj Жыл бұрын
@@GenomicsBootCamp ... Thank you for your reply sir... My data doesn't match with any of these formats.... It's an affymetrix genotypic 50k data having columns like Probeset ID, Animals ID ( 90 samples)with AA, AB, BB genotypes, affy SNP ID, chr id , start, strand, dbsnpRS ID etc.
@georgewanjala4605
@georgewanjala4605 3 жыл бұрын
Dear Professor, I have encountered some errors, please check out in your email for captures, I tried to share them here but was unable. thanks
@minakshi3645
@minakshi3645 2 жыл бұрын
can you help me to change txt format to ped , should I need to remove extra information present in my file (I downloaded snp data from ucsc browser and gwas catalog)
@GenomicsBootCamp
@GenomicsBootCamp 2 жыл бұрын
Hi, Without an exact format it is hard to suggest a solution. Could you give an example what columns are present, and how is it formatted? E.g. it is one line per SNP or one line per individual?
@minakshi3645
@minakshi3645 2 жыл бұрын
@@GenomicsBootCamp thank you for replying me.. So If I download my data from gwas catalog it downloaded in tsv format so the columns Are as follows - author,date , journal, link,study, disease, sample size, region, chromosome I'd,mapped gene,SNP id, strongest SNL risk allele p value, p value m log,cnv And if I fetch my data in CSV format from UCSC browser of apoe gene it have following column --- Name, chromosome,strand,txstart,txEnd,cdstart,cdends,exon count,exon starts,exonends, protein I'd,align I'd... These are my format but in Plink the format is different. Can u please tell me where to fetch SNP data for human disease or to change it in Plink required format
@minakshi3645
@minakshi3645 2 жыл бұрын
@@GenomicsBootCamp thank you for replying me.. So If I download my data from gwas catalog it downloaded in tsv format so the columns Are as follows - author,date , journal, link,study, disease, sample size, region, chromosome I'd,mapped gene,SNP id, strongest SNL risk allele p value, p value m log,cnv And if I fetch my data in CSV format from UCSC browser of apoe gene it have following column --- Name, chromosome,strand,txstart,txEnd,cdstart,cdends,exon count,exon starts,exonends, protein I'd,align I'd... These are my format but in Plink the format is different. Can u please tell me where to fetch SNP data for human disease or to change it in Plink required format
@vinaymore8210
@vinaymore8210 3 жыл бұрын
how to convert Vcf file to ped and map and what is .tbi file
@GenomicsBootCamp
@GenomicsBootCamp 3 жыл бұрын
Hi, the file conversion will be discussed in the video tomorrow (09.June). With the .tbi files, I do not have much experience, but they seem to be some kind of index files for VCF.
@reemalsaidi8664
@reemalsaidi8664 8 ай бұрын
I receive this comment 15761 MB RAM detected; reserving 7880 MB for main workspace. Error: Failed to open ADAPTmap_genotypeTOP_20160222_full.map. Also can i convert to csv format??
@GenomicsBootCamp
@GenomicsBootCamp 8 ай бұрын
The "Error: Failed to open..." error message usually refers to the missing file. Do you have that map and ped file, with that exat name in your working directory?
@kanatyermekbayev9
@kanatyermekbayev9 3 жыл бұрын
Hello, if one has .map and .ped files (instead of three you indicated) how shall she/he upload into PLINK using R? Thanks
@GenomicsBootCamp
@GenomicsBootCamp 3 жыл бұрын
Hi, with ped and .map files you need to use the "--file" option instead of the "--bfile" that is in the video. So you only need to delete the "b" and update the file name to yours. Also, you don't need to upload anything, just have the .ped and .map files in your working directory, similarly as in the video.
@kanatyermekbayev9
@kanatyermekbayev9 3 жыл бұрын
@@GenomicsBootCamp thanks for the response. The problem was with PLINK version 2. Download 1.9 and it is working well.
@GenomicsBootCamp
@GenomicsBootCamp 3 жыл бұрын
@@kanatyermekbayev9 Thanks for the clarification!
@ademolaaina6059
@ademolaaina6059 2 жыл бұрын
Hi Prof. Did you find a way to convert vcf to ped map?
@GenomicsBootCamp
@GenomicsBootCamp 2 жыл бұрын
Hi! Yes, there is a video on in on the channel: Convert between PLINK to VCF file formats (Remake) kzbin.info/www/bejne/e3unnKGofaaejtU
Genomics in practice - SNP data quality control with PLINK
13:53
Genomics Boot Camp
Рет қаралды 12 М.
Genomics in practice - SNP genotype data files
20:26
Genomics Boot Camp
Рет қаралды 15 М.
Я сделала самое маленькое в мире мороженое!
00:43
버블티로 부자 구별하는법4
00:11
진영민yeongmin
Рет қаралды 25 МЛН
Who’s the Real Dad Doll Squid? Can You Guess in 60 Seconds? | Roblox 3D
00:34
5 genomics file formats you must know
19:10
OMGenomics
Рет қаралды 25 М.
Merging genotype data with PLINK
12:10
Genomics Boot Camp
Рет қаралды 5 М.
GWAS in Plink
16:52
math et al
Рет қаралды 46 М.
Genomics in practice - Principal component analysis (PCA) based on SNP data
15:54
Convert between PLINK to VCF file formats (Remake)
14:25
Genomics Boot Camp
Рет қаралды 5 М.
Solving one of PostgreSQL's biggest weaknesses.
17:12
Dreams of Code
Рет қаралды 207 М.
Understanding Bioinformatics File Formats: SAM/BAM
7:07
Bioinformagician
Рет қаралды 16 М.
Genomics in practice - Introduction to R and RStudio
20:14
Genomics Boot Camp
Рет қаралды 7 М.
Association mapping using PLINK software
12:26
Genomics Lab
Рет қаралды 38 М.
How I'd Learn Data Analytics in 2024 (If I Had to Start Over)
14:08
CareerFoundry
Рет қаралды 811 М.