Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Hadoop Training and Certification Curriculum, Visit our Website: bit.ly/2Ozdh1I
@filipesan7 жыл бұрын
Thank you, from Portugal; I am studying for my exam on "Big Data Systems", and I have missed the class on Hadoop/Pig (the problem of being a working student); Now I think I got it clearly!
@edurekaIN7 жыл бұрын
Hey Filipe, thank you for watching our video. We are glad to have helped you here. You shoulld check out the courses we provide on our website: www.edureka.co Hope you find this useful as well. Cheers :)
@sumansetty35747 жыл бұрын
Vineeth was really a fabulous presenter, the way he explain was really amazing and it goes to my head directly with out any confusion, thanks a lot sir...expecting more from you and i need more pig videos.
@SrijanChakraborty5 жыл бұрын
Brilliant. Just what I needed
@kunjalsujalshah19923 жыл бұрын
Excellent teaching
@harshiniprasad77386 жыл бұрын
Iam very thankful to this team I thought big data is very boring subject nd no one is going to make it easy to grasp for me but edureka did 😃
@edurekaIN6 жыл бұрын
Hey Harshini, thank you for appreciating our work. Do subscribe and stay connected with us. Cheers :)
@ketanpatil34897 жыл бұрын
Good presentation. Thanks Edureka team!!
@niloychatterjee16034 жыл бұрын
Brilliant presentation...
@aartichugh59756 жыл бұрын
Thanks for explaining every bit of running PIG script.
@edurekaIN6 жыл бұрын
Hey Aarti, thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@gokulr947 жыл бұрын
very helpful thanks to edureka
@greatmonk5 жыл бұрын
great video sir!! really enjoyed the class!!!!!
@shubhambhatnagar0077 жыл бұрын
very good presentations thank you so much edureka....
@anirbansarkar63063 жыл бұрын
Thanks edureka, This was really a great tutorial.
@edurekaIN3 жыл бұрын
Hi : ) We really are glad to hear this ! Truly feels good that our team is delivering and making your learning easier :) Keep learning with us .Stay connected with our channel and team :) . Do subscribe the channel for more updates : ) Hit the bell icon to never miss an update from our channel : )
@himbisht088 жыл бұрын
very nice video, can you please tell, which is more popular in market Pig or Hive? in prospective of job.
@edurekaIN8 жыл бұрын
Hey Himanshu, thanks for checking out our tutorial! We cannot say for sure which one is the most popular. For example,Facebook uses Hive, whereas yahoo which has the biggest cluster in world uses Pig. If you know SQL, then Hive will be very familiar to you. Since Hive uses SQL, you will feel at home with select, where, group by, and order by clauses similar to SQL for relational databases. You do however lose some ability to optimize the query, by relying on the Hive optimizer. This seems to be the case for any implementation of SQL on any platform, Hadoop or traditional RDBMS, where hints are sometimes ironically needed to teach the automatic optimizer how to optimize properly. However, compared to Hive, Pig needs some mental adjustment for SQL users to learn. Pig Latin has many of the usual data processing concepts that SQL has, such as filtering, selecting, grouping, and ordering, but the syntax is a little different from SQL (particularly the group by and flatten statements!). Pig requires more verbose coding, although it’s still a fraction of what straight Java MapReduce programs require. Pig also gives you more control and optimization over the flow of the data than Hive does. Hope this helps you make the right decision. Cheers!
@Dipenparmar125 жыл бұрын
Great explanation.. keep it up.. thanks.
@edurekaIN5 жыл бұрын
Thanks for the compliment! We are glad you loved the video. Do subscribe, like and share to stay connected with us. Cheers!
@sudhanshumathur7256 жыл бұрын
very well explained
@edurekaIN6 жыл бұрын
Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@sumitarora64296 жыл бұрын
Thanq so much sir
@abhishekpandey21488 жыл бұрын
happy new year to dear trainer :)
@edurekaIN8 жыл бұрын
Hey Abhishek, thanks for checking out our tutorial and for the wishes. Happy New Year to you too, from the trainer and from Team Edureka! :) Also, do check out this tutorial: kzbin.info/www/bejne/aqu7pJqvg5mNetE. We thought you might like it too. Cheers!
@rhce21207 жыл бұрын
Thanks a lot.....Sir
@thepriestofvaranasi2 жыл бұрын
Sir can you share the version of cloudera quickstart vm that you're using? And it would be helpful if you could share a video of how to install it.
@edurekaIN2 жыл бұрын
Thanks for showing interest in Edureka kindly visit the channel for more videos our content creators are eagerly waiting for your suggestion to make new videos on your interest :) DO subscribe for the video update
@srinivasvemula19636 жыл бұрын
thank you edureka
@sarojsahu5395 жыл бұрын
superb sir!!
@maryjain17624 жыл бұрын
good class
@vishwajitbhagat95153 жыл бұрын
Great stuff. Can I get that log file
@edurekaIN3 жыл бұрын
Hi, kindly drop in your email id to help us assist you with the required files for your reference. Cheers :)
@abhishekbhatia88878 жыл бұрын
nice explanation. can we get advanced pig tutorial?
@edurekaIN8 жыл бұрын
Hey Abhishek, thanks for checking out our tutorial! Could you please let us know which Pig topics you are looking for so we can help you better? Cheers!
@ankitsaxenamusic7 жыл бұрын
This is a wonderful tutorial with detailed explanation. I just have a query in the samle.log file. What are the parameters in REGEX_EXTRACT. Can you please explain in detail what is $0 and what is 1 in the REGEX_EXTRACT. Thank you so much for your videos. Keep the good work going :)
@edurekaIN7 жыл бұрын
Hey Ankit, thanks for the wonderful feedback! We're glad you found our tutorial useful. Here's the explanation as requested. REGEX_EXTRACT Performs regular expression matching and extracts the matched group defined by an index parameter. Syntax REGEX_EXTRACT (string, regex, index) Terms string : The string in which to perform the match. regex : The regular expression. index : The index of the matched group to return. Use the REGEX_EXTRACT function to perform regular expression matching and to extract the matched group defined by the index parameter (where the index is a 1-based parameter.) The function uses Java regular expression form. The function returns a string that corresponds to the matched group in the position specified by the index. If there is no matched expression at that position, NULL is returned. Example This example will return the string '192.168.1.5'. REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1); Hope this helps. Cheers!
@avnish.dixit_5 жыл бұрын
Nice video
@sanjeevpandey27536 жыл бұрын
Thanks Sir
@edurekaIN6 жыл бұрын
Hey Sanjeev, thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)
@agodavarthy5 жыл бұрын
Can we do data processing like creating a dictionary(like in python) using PIG?
@edurekaIN5 жыл бұрын
Python Dictionaries and the Data Science Toolbox. As a data scientist working in Python, you'll need to temporarily store data all the time in an appropriate Python data structure to process it. A Python dictionary works in a similar way: stored dictionary items can be retrieved very fast by their key.
@rpattnaik20005 жыл бұрын
Good one !!
@sharonrosy95196 жыл бұрын
Tq sir
@tejaswinisana14058 жыл бұрын
hello sir , which is better pig or mapreduce ?in terms of processing speed?
@edurekaIN8 жыл бұрын
Hey Tejaswini, thanks for checking out our tutorial. Here's the answer to your query: Both are different. Pig is a Data Analytical language used to create Map-Reduce jobs to run on large datasets. While both work in a distributed environment and hand to hand. PIG is a data flow language, the key focus of Pig is manage the flow of data from input source to output store. A Pig is written specifically for managing data flow of Map reduce type of jobs. Most if not all jobs in a Pig are map reduce jobs or data movement jobs. Pig allows for custom functions to be added which can be used for processing in Pig, some default ones are like ordering, grouping, distinct, count etc. Map reduce on the other hand is a data processing paradigm, it is a framework for application developers to write code in so that its easily scaled to PB of tasks, this creates a separation between the developer that writes the application vs the developer that scales the application. Not all applications can be migrated to Map reduce but good few can be including complex ones like k-means to simple ones like counting uniques in a dataset. PIG commands are submitted as MapReduce jobs internally. An advantage PIG has over MapReduce is that the former is more concise. A 200 lines Java code written for MapReduce can be reduced to 10 lines of PIG code. A disadvantage PIG has: it is bit slower as compared to MapReduce as PIG commands are translated into MapReduce prior to execution. Hope this helps. Cheers!
@tejaswinisana14058 жыл бұрын
edureka! thanks a lot sir
@lakshmans7797 жыл бұрын
Hi Team is there any PDF document for hadoop from Edureka...
@edurekaIN7 жыл бұрын
Hey Lakshman, thanks for checking out our tutorial. Could you please elaborate on what you need in PDF? If it's the PPT, you can can check out related PPTs here: www.slideshare.net/search/slideshow?searchfrom=header&q=pig+tutorial+edureka&ud=any&ft=all&lang=**&sort= You can access our complete training by enrolling into our course here: www.edureka.co/big-data-and-hadoop. Hope this helps. Cheers!
@priyankagauda44207 жыл бұрын
great video sir but, i can not find sample.log file..can you please help
@edurekaIN7 жыл бұрын
Hey Priyanka, thanks for checking out our tutorial! We're glad you liked it. The files used in this tutorial are Edureka course artifacts that you can avail by enrolling into our course here: www.edureka.co/big-data-and-hadoop. Please feel free to get in touch if you have any questions or need any assistance. Hope this helps. Cheers!
@user-bo7iz1mi6h7 жыл бұрын
how u have moved the data in hadoop.?..did not get it.
@edurekaIN6 жыл бұрын
Hey, sorry for the delay. Using hdfs dfs -put . Hope this helps. Cheers!
@jenijohn8767 жыл бұрын
sir, very good presentation. Very clear to understand. Sir, where can I find the log file? Can you Please send me to my mail-id.
@edurekaIN7 жыл бұрын
Hey John! You can mention your emil address in the comments and we will mail it to you.
@ravijariwala97587 жыл бұрын
yes
@kashishkhetarpaul32147 жыл бұрын
how can we get this log file?
@edurekaIN7 жыл бұрын
Hey Kashish! send in your email ID here and we will send you the log files.
@vivekkvr7 жыл бұрын
Hi,Its Nice tutorial about PIG.I just want to know that in which best case will PIG used over HIVE in real time scenarios ?
@edurekaIN7 жыл бұрын
Hey Vivek, thanks for checking out our tutorial! We're glad you liked it. You can use PIG in case where your data is unstructured (it does not have a schema). PIG does not requires you to give schema of the file at the time you are loading(writing) it onto HDFS. It follows schema on read. Whereas HIVE simulates SQL like behaviour over HDFS(which means schema on write). Suppose you have to process a novel written by Shakespeare or a speech given by Donald Trump. In this case you will need PIG as these things(text files) are not structured and you can't write a novel in table (which requires you to provide schema). But, if you have a table with fixed column names and in each column the data type remains constant, then you will use HIVE. Hope this helps. Cheers!
@shivkumar707 жыл бұрын
Thanks for posting informative videos. I have tried pig script as it was explained in the video. But it got failed. Can you please let me know, How to make it success ? Content of sampleLog.pig: log = LOAD '/sample.log'; LEVELS = foreach log generate REGEX_EXTRACT($0,'(TRACE|DEBUG|INFO|WARN|ERROR|FATAL)', 1) as LOGLEVEL; FILTEREDLEVELS = FILTER LEVELS by LOGLEVEL is not null; GROUPEDLEVELS = GROUP FILTEREDLEVELS by LOGLEVEL; FREQUENCIES = foreach GROUPEDLEVELS generate group as LOGLEVEL, COUNT(FILTEREDLEVELS.LOGLEVEL) as COUNT; RESULT = order FREQUENCIES by COUNT desc; DUMP RESULT; hduser@ubuntu:~$ pig /home/hduser/HDFS_Practice_Dir/new_edureka/sampleLog.pig Failed Jobs: JobId Alias Feature Message Outputs job_1491887529789_0011 FILTEREDLEVELS,FREQUENCIES,GROUPEDLEVELS,LEVELS,log GROUP_BY,COMBINER Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:9000/sample.log . . . . . at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/sample.log Input(s): Failed to read data from "/sample.log" Output(s): Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1491887529789_0011 -> null, null -> null, null 2017-04-10 23:24:32,688 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2017-04-10 23:24:32,697 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias RESULT Details at logfile: /home/hduser/pig_1491891860556.log Log file content: Pig Stack Trace --------------- ERROR 1066: Unable to open iterator for alias RESULT org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias RESULT . . . Caused by: java.io.IOException: Couldn't retrieve job. at org.apache.pig.PigServer.store(PigServer.java:1083) at org.apache.pig.PigServer.openIterator(PigServer.java:994) ... 13 more ================================================================================
@edurekaIN7 жыл бұрын
Hey Shiva Kumar, thanks for checking out our tutorial. We're glad you liked it. The error is self-explanatory. The error is " Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:9000/sample.log" which clearly states that your input file path is wrong and sample.log does not exists at that location. The reason that it did not gave an error when you enter ' a = load '/sample.log' ' is that PIG starts a map-reduce job only when you type a dump statement. When you typed dump it started a mapreduce job and found error in first line of your pig script. Try checking if the file really exists at "hdfs://localhost:9000/sample.log". Hope this helps solve the issue. Cheers!