Solution Manual Big Data Analytics 17CS82 VTU CBCS

Solution Manual to Big Data Analytics 17CS82 VTU CBCS Question Bank

BDA-Question-Bank-with-Answers

In this article, you can find the solution manual of Big Data Analytics – 17CS82 of Computer Science and Engineering

8th Semester Big Data Analytics notes Computer Science and Engineering

Solution Manual Big Data Analytics 17CS82 VTU CBCS Module 1 Question Bank

1. With a neat diagram explain the components of the Hadoop Distributed file system (HDFS) (08 M Nov 2020)

2. Write and explain the mapper and reducer scripts for the MapReduce model.  (08M Nov 2020)

3. With a neat diagram, describe the steps in the MapReduce parallel flow data model. (08 M Nov 2020, Jan 2020)

4. Explain the following roles in HDFS deployment with a diagram i) High Availability ii) Name Node Federation. (08 M Nov 2020)

5. How does the Hadoop Mapreduce Data flow work for a word count program? Give an example. (8 M July 2019)

6. Briefly explain HDFS Name Node federation, NFS Gateway, Snapshots, Checkpoint, and backups. (8 M July 2019)

7. What do you understand by HDFS? Explain the components with a neat diagram. (10 M July 2019)

8. Bring out the concept of HDFS block replication with an example. (6 M July 2019)

9. With an example explain the different general HDFS commands.

10. Write the java code for MAP and REDUCE of word count problem. Describe the steps of compiling and removing the map-reduce program.  (9 M Jan 2020)

11. What is HDFS? List the components of HDFS and explain any four of them. (9 M Jan 2020)

Big Data Analytics Module 2 Question Bank with Answers

1. What is the significance of Apache pig in the Hadoop context? Describe the main components and the working of Apache pig with a simple example.  (08 M Nov 2020)

2. Explain the features and benefits of apache HIVE in Hadoop. (08 M Nov 2020)

3. With neat diagrams, explain the Oozie DAG workflow and the types of nodes in the workflow. (08 M Nov 2020)

4. What is Apache Flume? Describe the features, components, and working of Apache Flume. (08 M Nov 2020)

5. Explain the Apache Sqoop export and import method with a neat diagram. (10 M July 2019)

6. Explain with a neat diagram, the Apache Oozie workflow for Hadoop architecture. (6 M July 2019)

7. How do you run map Reduce and Message Passing Interface (MPI) on YARN architecture? (10 M July 2019)

8. What do you understand by YARN distributed shell? (6 M July 2019)

9. Describe with a neat diagram, the two-step Sqoop data export and import method. (8 M Jan 2020)

10. With neat diagram discusses the various frameworks that run under YARN. (8 M Jan 2020)

11. Discuss the different views supported by Apache Ambari. (6 M Jan 2020)

12. Discuss the various features of Hadoop YARN administration. (4 M Jan 2020)

13. Explain the different HDFS administration features. (6 M Jan 2020)

Big Data Analytics Module 3 Question Bank with Answers

1. Drawflow diagram of BIDM. Explain strategic and operational decisions. (8 M Nov 2020)

2. Write any four is Business Intelligence (BI) applications for various sectors. (8 M July 2019)

3. Explain the star schema of design of Data Warehousing with an example. (6 M July 2019)

4. Differentiate between the data mart and the data warehouse based on the following with justifications:  Scope, target organization, Cost, approach, complexity, and time. (8 M Nov 2020)

5. Describe any 8 design considerations for a data warehouse and explain key elements with diagrammatic representations. (08 M Nov 2020)

6. What is a confusion matrix? Explain. (2 M July 2019)

7. Explain with diagram CRISP-DM data mining cycle. (8 M July 2019, Jan 2020, Nov 2020)

8. What do you understand by the term Data Visualization? How is it important in Big data Analytics? (5 M July 2019)

9. Differentiate between Data Mining and Data Warehousing. (3 M July 2019)

10. What is Bussiness Intelligence (BI)? List the different BI applications and explain in detail any five applications. (10 M Jan 2020)

11. With a neat diagram explain data warehouse architecture. (6 M Jan 2020)

12. Describe the common data mining mistakes (4 M Jan 2020)

13. Describe the common data mining myths.

14. List and describe the various charts use for data visualization. (4 M Jan 2020)

15. What is data mining? Explain steps in data cleaning and preparation.

Big Data Analytics Module 4 Question Bank with Answers

1. What is a splitting variable? Describe three criteria for choosing a splitting variable. (4 M July 2019)

2. List the advantages and disadvantages of a regression model. (4 M July 2019)

3. Explain the steps and three differentiating criteria of a decision tree algorithm. Construct the decision tree for the following dataset and predict the out for the given question. (10 M Nov 2020, 8 M 2020)

OutlookTempHumidityWindyPlay
SunnyHotHighFalseNo
SunnyHotHighTrueNo
OvercastHotHighFalseYes
RainyMildHighFalseYes
RainyCoolNormalFalseYes
RainyCoolNormalTrueNo
OvercastCoolNormalTrueYes
SunnyMildHighFalseNo
SunnyCoolNormalFalseYes
RainyMildNormalFalseYes
SunnyMildNormalTrueYes
OvercastMildHighTrueYes
OvercastHotNormalFalseYes
RainyMildHighTrueNo
OutlookTempHumidityWindyPlay
SunnyHotNormalTrue?

4. Create a decision tree for the following dataset and predict whether the loan is approved or not (8 M July 2019)

AgeJobHouseCreditLoan Approved
YoungFalseNoFairNo
YoungFalseNoGoodNo
YoungTrueNoGoodYes
YoungTrueYesFairYes
YoungFalseNoFairNo
MiddleFalseNoFairNo
MiddleFalseNoGoodNo
MiddleTrueYesGoodYes
MiddleFalseYesExcellentYes
MiddleFalseYesExcellentYes
OldFalseYesExcellentYes
OldFalseYesGoodYes
OldTrueNoGoodYes
OldTrueNoExcellentYes
OldFalseNoFairNo
AgeJobHouseCreditLoan Approved
YoungFalseNoGood?

5. Explain the design principles of an artificial neural network. (8 M July 2019)

6. Explain the design principles of an artificial neural network constructing a model representation for a single and multilayer perceptron. Describe the steps to build ANN (Artificial neural networks) (10 M Nov 2020)

7. How does the Apriori Algorithm work? Apply the same for the following example. Assume the support count is 2. (8 M July 2019)– V

TIDList of Items IDs
T100I1, I2, I5
T200I2, I4
T300I2, I3
T400I1, I2, I4
T500I1, I3
T600I2, I3
T700I1, I3
T800I1, I2, I3, I5
T900I1, I2, I3

8. Describe the advantages and disadvantages of a regression model. (8 M Jan 2020)

9. Write the different steps involved in developing an artificial neural network. (5 M Jan 2020)

10. Describe the advantages of using ANN. (3 M Jan 2020)

11. For the following example describes the different steps of forming association rules using the Apriori algorithm with support of 33% and confidence of 50%. (8 M Jan 2020, Nov 2020) – V

1MilkEggBreadButter
2MilkButterEggKetchup
3BreadButterKetchup
4MilkBreadButter
5BreadButterCookies
6MilkBreadButterCookies
7MilkCookies
8MilkBreadButter
9BreadButterEggCookies
10MilkButterBread
11MilkBreadButter
12MilkBreadCookiesKetchup

12. For the given City Size, Avg. Income, Local Investors, LOHAS Awareness Data set apply the Decision Tree algorithm and find the optimal decision tree. Also, predict the class label for a new example.

City SizeAvg. IncomeLocal InvestorsLOHAS AwarenessDecision
BigHighYesHighYes
MediumMedNoMedNo
SmallLowYesLowNo
BigHighNoHighYes
SmallMedYesHighNo
MedHighYesMedYes
MedMedYesMedNo
BigMedNoMedNo
MedHighYesLowNo
SmallHighNoHighYes
SmallMedNoHighNo
MedHeighNoMedNo
City SizeAvg. IncomeLocal InvestorsLOHAS AwarenessDecision
MedMedNoMed?

Big Data Analytics Module 5 Question Bank with Answers

1. Compare text mining with data mining. (8 M Nov 2020)

2. What is Naïve Bayes technique? Explain its model. (5 M July 2019)

3. Explain steps in the text mining process and architecture (8 M Nov 2020)

4. What is a support vector machine? Explain its model. (8 M  July 2019)

5. Mention the 3-step process of Text Mining. (3 M July 2019)

6. Explain briefly the three different types of web mining. (6 M July 2019)

7. Compute the rank values for the nodes of the following network shown in below fig. Which is the highest-ranked node? Solve the same with eight iterations. (8 M July 2019, Nov 2020)

Compute the rank values

8. Describe the difference between text mining and data mining. (6 M)

9. Explain Naïve Bayes model. What are the advantages and disadvantages of the Naïve Bayes model?

10. Briefly describe the Support vector machine (SVM) technique. (4 M)

11. What are the advantages and disadvantages of Support vector machine – SVM?

12. Explain the Naïve Bayes model to classify the text data into the right class using the following dataset. (6 M)

Document IDKeywords in the documentClass h
1Love Happy Joy Joy HappyYes
2Happy Love Kick Joy HappyYes
3Love Move Joy GoodYes
4Love Happy Joy Love PainYes
5Joy Love Pain Kick PainNo
6Pain Pain Love kickNo
7Love Pain Joy Love Kick?

13. What is web mining? Explain the different types of web mining. (8 M)

14. Explain three types of web mining. Use an appropriate flow diagram to represent the same. (8 M Nov 2020)

15. Write a short note on Social Network Analysis (SNA). Numerical examples on Naïve Bayes Model, SVM, and SNA (Rank Calculation).

16. Suppose we have the height, weight, and T-shirt size of some customers and we need to predict the T-shirt size of a new customer given only the height and weight information we have. Data including height, weight, and T-shirt size information is shown below

Height (in cms)158158160163163160163165165165170170170
Weight (in kgs)58596060616464616265636468
T-Shirt SizeMMMMLLLLLLLLL

Determine the T-Shirt size of a new customer with a weight of 61 kg and height of 161 cms using KNN with K=5.

Follow the link for Solution

8th Semester Big Data Analytics notes Computer Science and Engineering

8th Semester Computer Science and Engineering Sun Star Exam Scanner

2018 Scheme Computer Science and Engineering VTU CBCS Notes

If you like Solution Manual Big Data Analytics 17CS82 VTU CBCS, Subscribe to our YouTube channel for more videos and like the Facebook page for regular updates.

Leave a Comment

Your email address will not be published.