Pig Tutorial | Apache Pig Script | Hadoop Pig Tutorial

Pig Tutorial | Apache Pig Script | Hadoop Pig Tutorial | Edureka

Рет қаралды 120,803

Күн бұрын

Пікірлер: 66

@edurekaIN 6 жыл бұрын

Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Hadoop Training and Certification Curriculum, Visit our Website: bit.ly/2Ozdh1I

@filipesan 7 жыл бұрын

Thank you, from Portugal; I am studying for my exam on "Big Data Systems", and I have missed the class on Hadoop/Pig (the problem of being a working student); Now I think I got it clearly!

@edurekaIN 7 жыл бұрын

Hey Filipe, thank you for watching our video. We are glad to have helped you here. You shoulld check out the courses we provide on our website: www.edureka.co Hope you find this useful as well. Cheers :)

@sumansetty3574 7 жыл бұрын

Vineeth was really a fabulous presenter, the way he explain was really amazing and it goes to my head directly with out any confusion, thanks a lot sir...expecting more from you and i need more pig videos.

@SrijanChakraborty 5 жыл бұрын

Brilliant. Just what I needed

@kunjalsujalshah1992 3 жыл бұрын

Excellent teaching

@harshiniprasad7738 6 жыл бұрын

Iam very thankful to this team I thought big data is very boring subject nd no one is going to make it easy to grasp for me but edureka did 😃

@edurekaIN 6 жыл бұрын

Hey Harshini, thank you for appreciating our work. Do subscribe and stay connected with us. Cheers :)

@ketanpatil3489 7 жыл бұрын

Good presentation. Thanks Edureka team!!

@niloychatterjee1603 4 жыл бұрын

Brilliant presentation...

@aartichugh5975 6 жыл бұрын

Thanks for explaining every bit of running PIG script.

@edurekaIN 6 жыл бұрын

Hey Aarti, thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)

@gokulr94 7 жыл бұрын

very helpful thanks to edureka

@greatmonk 5 жыл бұрын

great video sir!! really enjoyed the class!!!!!

@shubhambhatnagar007 7 жыл бұрын

very good presentations thank you so much edureka....

@anirbansarkar6306 3 жыл бұрын

Thanks edureka, This was really a great tutorial.

@edurekaIN 3 жыл бұрын

Hi : ) We really are glad to hear this ! Truly feels good that our team is delivering and making your learning easier :) Keep learning with us .Stay connected with our channel and team :) . Do subscribe the channel for more updates : ) Hit the bell icon to never miss an update from our channel : )

@himbisht08 8 жыл бұрын

very nice video, can you please tell, which is more popular in market Pig or Hive? in prospective of job.

@edurekaIN 8 жыл бұрын

Hey Himanshu, thanks for checking out our tutorial! We cannot say for sure which one is the most popular. For example,Facebook uses Hive, whereas yahoo which has the biggest cluster in world uses Pig. If you know SQL, then Hive will be very familiar to you. Since Hive uses SQL, you will feel at home with select, where, group by, and order by clauses similar to SQL for relational databases. You do however lose some ability to optimize the query, by relying on the Hive optimizer. This seems to be the case for any implementation of SQL on any platform, Hadoop or traditional RDBMS, where hints are sometimes ironically needed to teach the automatic optimizer how to optimize properly. However, compared to Hive, Pig needs some mental adjustment for SQL users to learn. Pig Latin has many of the usual data processing concepts that SQL has, such as filtering, selecting, grouping, and ordering, but the syntax is a little different from SQL (particularly the group by and flatten statements!). Pig requires more verbose coding, although it’s still a fraction of what straight Java MapReduce programs require. Pig also gives you more control and optimization over the flow of the data than Hive does. Hope this helps you make the right decision. Cheers!

@Dipenparmar12 5 жыл бұрын

Great explanation.. keep it up.. thanks.

@edurekaIN 5 жыл бұрын

Thanks for the compliment! We are glad you loved the video. Do subscribe, like and share to stay connected with us. Cheers!

@sudhanshumathur725 6 жыл бұрын

very well explained

@edurekaIN 6 жыл бұрын

Thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)

@sumitarora6429 6 жыл бұрын

Thanq so much sir

@abhishekpandey2148 8 жыл бұрын

happy new year to dear trainer :)

@edurekaIN 8 жыл бұрын

Hey Abhishek, thanks for checking out our tutorial and for the wishes. Happy New Year to you too, from the trainer and from Team Edureka! :) Also, do check out this tutorial: kzbin.info/www/bejne/aqu7pJqvg5mNetE. We thought you might like it too. Cheers!

@rhce2120 7 жыл бұрын

Thanks a lot.....Sir

@thepriestofvaranasi 2 жыл бұрын

Sir can you share the version of cloudera quickstart vm that you're using? And it would be helpful if you could share a video of how to install it.

@edurekaIN 2 жыл бұрын

Thanks for showing interest in Edureka kindly visit the channel for more videos our content creators are eagerly waiting for your suggestion to make new videos on your interest :) DO subscribe for the video update

@srinivasvemula1963 6 жыл бұрын

thank you edureka

@sarojsahu539 5 жыл бұрын

superb sir!!

@maryjain1762 4 жыл бұрын

good class

@vishwajitbhagat9515 3 жыл бұрын

Great stuff. Can I get that log file

@edurekaIN 3 жыл бұрын

Hi, kindly drop in your email id to help us assist you with the required files for your reference. Cheers :)

@abhishekbhatia8887 8 жыл бұрын

nice explanation. can we get advanced pig tutorial?

@edurekaIN 8 жыл бұрын

Hey Abhishek, thanks for checking out our tutorial! Could you please let us know which Pig topics you are looking for so we can help you better? Cheers!

@ankitsaxenamusic 7 жыл бұрын

This is a wonderful tutorial with detailed explanation. I just have a query in the samle.log file. What are the parameters in REGEX_EXTRACT. Can you please explain in detail what is $0 and what is 1 in the REGEX_EXTRACT. Thank you so much for your videos. Keep the good work going :)

@edurekaIN 7 жыл бұрын

Hey Ankit, thanks for the wonderful feedback! We're glad you found our tutorial useful. Here's the explanation as requested. REGEX_EXTRACT Performs regular expression matching and extracts the matched group defined by an index parameter. Syntax REGEX_EXTRACT (string, regex, index) Terms string : The string in which to perform the match. regex : The regular expression. index : The index of the matched group to return. Use the REGEX_EXTRACT function to perform regular expression matching and to extract the matched group defined by the index parameter (where the index is a 1-based parameter.) The function uses Java regular expression form. The function returns a string that corresponds to the matched group in the position specified by the index. If there is no matched expression at that position, NULL is returned. Example This example will return the string '192.168.1.5'. REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1); Hope this helps. Cheers!

@avnish.dixit_ 5 жыл бұрын

Nice video

@sanjeevpandey2753 6 жыл бұрын

Thanks Sir

@edurekaIN 6 жыл бұрын

Hey Sanjeev, thank you for watching our video. Do subscribe, like and share to stay connected with us. Cheers :)

@agodavarthy 5 жыл бұрын

Can we do data processing like creating a dictionary(like in python) using PIG?

@edurekaIN 5 жыл бұрын

Python Dictionaries and the Data Science Toolbox. As a data scientist working in Python, you'll need to temporarily store data all the time in an appropriate Python data structure to process it. A Python dictionary works in a similar way: stored dictionary items can be retrieved very fast by their key.

@rpattnaik2000 5 жыл бұрын

Good one !!

@sharonrosy9519 6 жыл бұрын

Tq sir

@tejaswinisana1405 8 жыл бұрын

hello sir , which is better pig or mapreduce ?in terms of processing speed?

@edurekaIN 8 жыл бұрын

Hey Tejaswini, thanks for checking out our tutorial. Here's the answer to your query: Both are different. Pig is a Data Analytical language used to create Map-Reduce jobs to run on large datasets. While both work in a distributed environment and hand to hand. PIG is a data flow language, the key focus of Pig is manage the flow of data from input source to output store. A Pig is written specifically for managing data flow of Map reduce type of jobs. Most if not all jobs in a Pig are map reduce jobs or data movement jobs. Pig allows for custom functions to be added which can be used for processing in Pig, some default ones are like ordering, grouping, distinct, count etc. Map reduce on the other hand is a data processing paradigm, it is a framework for application developers to write code in so that its easily scaled to PB of tasks, this creates a separation between the developer that writes the application vs the developer that scales the application. Not all applications can be migrated to Map reduce but good few can be including complex ones like k-means to simple ones like counting uniques in a dataset. PIG commands are submitted as MapReduce jobs internally. An advantage PIG has over MapReduce is that the former is more concise. A 200 lines Java code written for MapReduce can be reduced to 10 lines of PIG code. A disadvantage PIG has: it is bit slower as compared to MapReduce as PIG commands are translated into MapReduce prior to execution. Hope this helps. Cheers!

@tejaswinisana1405 8 жыл бұрын

edureka! thanks a lot sir

@lakshmans779 7 жыл бұрын

Hi Team is there any PDF document for hadoop from Edureka...

@edurekaIN 7 жыл бұрын

Hey Lakshman, thanks for checking out our tutorial. Could you please elaborate on what you need in PDF? If it's the PPT, you can can check out related PPTs here: www.slideshare.net/search/slideshow?searchfrom=header&q=pig+tutorial+edureka&ud=any&ft=all&lang=**&sort= You can access our complete training by enrolling into our course here: www.edureka.co/big-data-and-hadoop. Hope this helps. Cheers!

@priyankagauda4420 7 жыл бұрын

great video sir but, i can not find sample.log file..can you please help

@edurekaIN 7 жыл бұрын

Hey Priyanka, thanks for checking out our tutorial! We're glad you liked it. The files used in this tutorial are Edureka course artifacts that you can avail by enrolling into our course here: www.edureka.co/big-data-and-hadoop. Please feel free to get in touch if you have any questions or need any assistance. Hope this helps. Cheers!

@user-bo7iz1mi6h 7 жыл бұрын

how u have moved the data in hadoop.?..did not get it.

@edurekaIN 6 жыл бұрын

Hey, sorry for the delay. Using hdfs dfs -put . Hope this helps. Cheers!

@jenijohn876 7 жыл бұрын

sir, very good presentation. Very clear to understand. Sir, where can I find the log file? Can you Please send me to my mail-id.

@edurekaIN 7 жыл бұрын

Hey John! You can mention your emil address in the comments and we will mail it to you.

@ravijariwala9758 7 жыл бұрын

yes

@kashishkhetarpaul3214 7 жыл бұрын

how can we get this log file?

@edurekaIN 7 жыл бұрын

Hey Kashish! send in your email ID here and we will send you the log files.

@vivekkvr 7 жыл бұрын

Hi,Its Nice tutorial about PIG.I just want to know that in which best case will PIG used over HIVE in real time scenarios ?

@edurekaIN 7 жыл бұрын

Hey Vivek, thanks for checking out our tutorial! We're glad you liked it. You can use PIG in case where your data is unstructured (it does not have a schema). PIG does not requires you to give schema of the file at the time you are loading(writing) it onto HDFS. It follows schema on read. Whereas HIVE simulates SQL like behaviour over HDFS(which means schema on write). Suppose you have to process a novel written by Shakespeare or a speech given by Donald Trump. In this case you will need PIG as these things(text files) are not structured and you can't write a novel in table (which requires you to provide schema). But, if you have a table with fixed column names and in each column the data type remains constant, then you will use HIVE. Hope this helps. Cheers!

@shivkumar70 7 жыл бұрын

Thanks for posting informative videos. I have tried pig script as it was explained in the video. But it got failed. Can you please let me know, How to make it success ? Content of sampleLog.pig: log = LOAD '/sample.log'; LEVELS = foreach log generate REGEX_EXTRACT($0,'(TRACE|DEBUG|INFO|WARN|ERROR|FATAL)', 1) as LOGLEVEL; FILTEREDLEVELS = FILTER LEVELS by LOGLEVEL is not null; GROUPEDLEVELS = GROUP FILTEREDLEVELS by LOGLEVEL; FREQUENCIES = foreach GROUPEDLEVELS generate group as LOGLEVEL, COUNT(FILTEREDLEVELS.LOGLEVEL) as COUNT; RESULT = order FREQUENCIES by COUNT desc; DUMP RESULT; hduser@ubuntu:~$ pig /home/hduser/HDFS_Practice_Dir/new_edureka/sampleLog.pig Failed Jobs: JobId Alias Feature Message Outputs job_1491887529789_0011 FILTEREDLEVELS,FREQUENCIES,GROUPEDLEVELS,LEVELS,log GROUP_BY,COMBINER Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:9000/sample.log . . . . . at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/sample.log Input(s): Failed to read data from "/sample.log" Output(s): Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1491887529789_0011 -> null, null -> null, null 2017-04-10 23:24:32,688 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2017-04-10 23:24:32,697 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias RESULT Details at logfile: /home/hduser/pig_1491891860556.log Log file content: Pig Stack Trace --------------- ERROR 1066: Unable to open iterator for alias RESULT org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias RESULT . . . Caused by: java.io.IOException: Couldn't retrieve job. at org.apache.pig.PigServer.store(PigServer.java:1083) at org.apache.pig.PigServer.openIterator(PigServer.java:994) ... 13 more ================================================================================

@edurekaIN 7 жыл бұрын

Hey Shiva Kumar, thanks for checking out our tutorial. We're glad you liked it. The error is self-explanatory. The error is " Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:9000/sample.log" which clearly states that your input file path is wrong and sample.log does not exists at that location. The reason that it did not gave an error when you enter ' a = load '/sample.log' ' is that PIG starts a map-reduce job only when you type a dump statement. When you typed dump it started a mapreduce job and found error in first line of your pig script. Try checking if the file really exists at "hdfs://localhost:9000/sample.log". Hope this helps solve the issue. Cheers!