How the Features of the Cars Effecting Users Acceptance? Using D
i want a report same as the one attached
the subject is: the impact of e-learning on performance of university students
data mining using weka
provide me the dataset you will use
Report Outline
1. Format of the Report: 4 pages; 2 columns; font: Times New Roman, Size 10; Single spacing.
2. Provide a good title that reflects your work.
3. Start with an Abstract that summarizes your work
4. Introduction (½ page – 1 page): Background, problem description, data collection (if any), etc.
5. Similar Work (½ page – 1 page): a short survey showing some similar works that have been conducted in the same proposed field of yours.
6. Your work (1-2 pages): all the development stages you have gone through.
7. Results (½ page): Summarize your findings, observations,
8. Use proper citation and a list of references at the end of the report.
9. Support your report with diagrams/figures/tables whenever necessary.
How the Features of the Cars Effecting Users Acceptance? Using Data mining to estimate the acceptance of cars based on their features.
Abrar Salman
Information Technology Department University of Bahrain
Sakheer, Kingdom of Bahrain [email protected]
Fatima Ali Information Technology Department
University of Bahrain Sakheer, Kingdom of Bahrain [email protected]
Supervised by: Dr. Ahmed Zeki
Abstract
The features of the cars have a great impact on the acceptance level of users. The aim of this study is to find the hidden relation and patterns between the features of the car and the users acceptance using data mining. This study will be beneficial to car dealers to decide what is going to be on demand, as well as the manufacturers to improve their final products. Data mining techniques used are Naïve Bayes, Simple KMeans and the association rules.
Index Terms— car, acceptance, features.
I. INTRODUCTION
Car manufacturer companies are aiming to be the best in the market by providing competitive features in many aspects like price, appearance, comfort, performance, safety and many others to make their product more acceptable by the users.
Data Mining is the extraction of interesting patterns or knowledge from huge amount of data. Data mining is going to be used in this project to measure how acceptance can be affected by the features.
II. DATA COLLECTION
To find the relation between the features and the acceptance level, we looked for a ready dataset, and we found one created by Marko Bohanec in 1997 [3], with the title Car Evaluation Database. This dataset was suitable for our research since it has 6 attributes and 1728 instances, which will provide us with more reliable results.
III. DATA PREPARATION
The dataset was already in an Attribute-Relation File Format (.arff), data was clean, nothing was missing, and all the attributes were nominal. We made some basic improvements by changing the name of the class attribute to “accept” and combining the values of it (vgood and good) into one distinct, since the both very small and they refer almost to the same thing.
IV. ATTRIBUTES
The dataset has seven nominal attributes along with the class shown in (Table 1.1), that includes the names of the
attributes, a brief description and the possible values for each.
Table1.1 Attributes Attribute Description Value
buying Buying price. ¯ vhigh
¯ high
¯ med
¯ low maint Price of the
maintenance. ¯ vhigh
¯ high
¯ med
¯ low doors Number of
doors. ¯ 2
¯ 3
¯ 4
¯ 5more persons Capacity in
terms of persons to carry.
¯ 2
¯ 4
¯ more lug_boot The size of
luggage boot. ¯ small
¯ med
¯ big safety Estimated
safety of the car.
¯ low
¯ med
¯ high accept Car
acceptability. ¯ unacc
¯ acc
¯ good
V. OBJECTIVES
The main objective of this project is to use three different data mining tools to analyze data, and extract any possible pattern or knowledge. The tools and techniques used in this project are Naïve Bayes, Association Rules and Clustering using Simple K-means.
A. Classification: Naïve Bayes
A probabilistic classification algorithm that based on probability models, and the independent hypotheses are combined. Usually, these hypotheses are not affecting the reality. So, they are considered as naive. [1]
figure 1.1 Naïve Bayes summary using Cross-Validation
As shown in figure 3.2, 86.8056% were accurately estimated by the classifier 1500 instances out of 1728 instances, which means the accuracy of the model is about 86.8%.
13.1944% where incorrectly classified 228 instances out of 1728 instances.
figure 1.2 Detailed accuracy by class
There are three classes unacc (unacceptable), acc(acceptable), good which is refer to the car accuracy
TP Rate: rate of true positives (rate of instances that correctly classifies to a true class)
FP Rate: rate of false positives (rate of instances that incorrectly classified to a wrong class)
Precision: the ratio of dividing TP rate by the total numbers of instances classified to the given class.
Recall: rate of instances classified to a class divided by the actual number of instances in this class.
F-Measure: its calculated by multiplying 2 by precision by recall and divide the total by the summation of precision and recall.
figure 1.3 Naïve Bayes Confusion Matrix
the figure above shows that 1161 instances of class a(unacc) is correctly classified to a and 49 instances of class a is incorrectly classified to b and c, 272 instances of class b(acc) is correctly classified to b and 112 instances of class b is incorrectly classified to a and c, 67 instances of class c(good) is correctly classified to c and 67 instances of class c is incorrectly classified to b.
B. Clustering: Simple K-Means
The second tool used in this report is clustering using the k-means. K-means is an unsupervised statistical clustering technique, that is simple yet was proved as an effective tool.
The aim of this project is to prove that the better features the car has, the more accepted by user it will be.
In this step, data will be classified in different clusters to see the different relations between the features and the acceptance. We decided to have seven clusters in the dataset after using the elbow method.
figure 2.1 Simple K-Means
After applying the K-means to the dataset, it shows that
the clusters are classified as: Clusters 0, 2, 3, 5 are unacceptable, clusters 4 and 6 are
acceptable, cluster 1 is good.
figure 2.2 Clustered Instances
For the clustered instances output, it shows that 27% of the data was clustered in cluster 0, 12% in cluster 1, 14% in cluster 2, 15% in cluster 3, 11% in cluster 4, 12% in cluster 5, and finally 9% in cluster 6.
figure 2.3 Final cluster centroids
Simple K-means results:
Table 1.2 Cluster 0 model buying vhigh maint med doors 5more persons 2 lug_boot small safety low accept unacc
Table 1.3 Cluster 4 model buying high maint high doors 3 persons more lug_boot med safety high accept acc
Clusters 0 and 4 are a great example showing how the
features can affect the acceptance of users. As shown on cluster 0, if the prices are high with low features, users will refuse to accept them because they will feel like their money is not well spent, but in the other hand in cluster 4, users accept paying high amount to buy the car and maintain it if they are getting better options in return.
C. Association Rules: Apriori
Apriori algorithm is a classical algorithm in data mining that is being used to mining frequent sets and relevant association rules. [2]
This method will show the relations between the attributes and the class and the best rules for the data based on the confidence factor that was set to 90%.
We extracted 13 rules for this data set as shown below in figure 3.1:
figure 3.1 Apriori
The confidence level for all the rules is 100%. The rules show the cases where the cars are
unacceptable by the users when: 1. The number of persons is 2 only. 2. The car has low safety. 3. It is 2 persons with a small luggage boot. 4. It is 2 persons with a medium luggage boot. 5. It is 2 persons with a big luggage boot. 6. 2 persons and low safety. 7. 2 persons and medium safety. 8. 2 persons and high safety. 9. 4 persons and low safety. 10. More persons with low safety. 11. Small luggage boot with low safety. 12. Medium luggage boot with low safety. 13. Big luggage boot with low safety.
CONCLUSION: According to the findings and outputs founded from
applying the three data mining methodologies, it can be concluding many points:
The accuracy of the estimated results by using Naïve Bayes is approximately 87% which is considered as a good percent.
K-means considered clusters as follow: clusters 0, 2, 3, 5 are unacceptable, clusters 4 and 6 are acceptable, cluster 1 is good.
The best role and the strong relationship founded by Association rules method is between the person’s number and the unacceptable of the car.
ACKNOWLEDGMENT
A great thank for the creator and donor Marko Bohanec and the donor Blaz Zupan of their useful database that helped us in doing and complete our project. Also, we thank our instructor Dr. Ahmed Zaki and lab assistant Miss. Hajer Khalifa for their efforts and their time explaining the material of the course in an interesting way.
REFERENCES
[1]"IBM Knowledge Center", Ibm.com, 2018. [Online]. Available: https://www.ibm.com/support/knowledgecenter/en/SSEPG G_9.7.0/com.ibm.im.overview.doc/c_naive_bayes_classific ation.html. [Accessed: 30- Dec- 2018]. [2]"Rashmi Jain, Author at HackerEarth Blog", HackerEarth Blog, 2018. [Online]. Available: https://www.hackerearth.com/blog/author/rashmi/?post. [Accessed: 30- Dec- 2018]. [3]"renatopp/arff-datasets", GitHub, 2018. [Online]. Available: https://github.com/renatopp/arff- datasets/blob/master/classification/car.arff#L1. [Accessed: 31- Dec- 2018].
APPENDIX
1. Choosing the number of clusters by using the elbow method.
Number of cluster Sum of square error 2 6577.0 3 6073.0 4 5727.0 5 5596.0 6 5303.0 7 5077.0 8 4974.0 9 4889.0
10 4802.0
4000
4500
5000
5500
6000
6500
7000
0 1 2 3 4 5 6 7 8 9 10 11
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.