Iris Classification Dataset with Decision Trees
- Suhas Bhairav

- Jan 25, 2024
- 2 min read
Introduction:
In the dynamic world of machine learning, the classification of iris flowers based on their sepal and petal measurements stands as a captivating challenge. In this blog post, we'll embark on a journey through a Python code snippet that unlocks the power of Decision Trees. Leveraging the scikit-learn library, we'll explore how Decision Trees can elegantly classify iris flowers, unraveling the intricacies of the code and the underlying principles of this intuitive and transparent algorithm.
Libraries Used:
The code leverages various modules from scikit-learn, focusing on the DecisionTreeClassifier for decision tree-based classification.
1. scikit-learn: A comprehensive machine learning library, scikit-learn provides tools for data analysis, model building, and evaluation.
2. Decision Tree: Decision trees are powerful models that make decisions based on input features.
3. DecisionTreeClassifier: Part of the scikit-learn library, the DecisionTreeClassifier is an implementation of decision tree algorithms for classification tasks.
4. Iris Dataset: The Iris dataset is a classic dataset for machine learning, often used for classification tasks.
Code Explanation:
# Import necessary modulesfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_scorefrom sklearn.tree import DecisionTreeClassifier# Load the Iris datasetiris = load_iris()X = iris.datay = iris.target# Split the data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)# Initialize a Decision Tree Classifierclf = DecisionTreeClassifier()# Train the classifier on the training dataclf.fit(X_train, y_train)# Make predictions on the test datay_pred = clf.predict(X_test)# Print the accuracy score of the classifierprint(accuracy_score(y_test, y_pred))Explanation:
1. Loading the Dataset: Our exploration begins with loading the Iris dataset using the `load_iris` function from scikit-learn. This dataset contains measurements of sepal length, sepal width, petal length, and petal width for three species of iris flowers.
2. Data Splitting: The dataset is then split into training and testing sets using the `train_test_split` function. This ensures that the model is trained on a subset of the data and evaluated on a separate, unseen subset.
3. Decision Tree Classifier Initialization: An instance of the Decision Tree Classifier is initialized using the `DecisionTreeClassifier` class from scikit-learn. Decision trees are known for their transparency and ability to capture complex decision boundaries.
4. Training the Classifier: The classifier is trained on the training data using the `fit` method. During this phase, the decision tree learns to make decisions based on the features of the input data.
5. Making Predictions: Predictions are then made on the test data using the `predict` method. The decision tree's learned decision-making process is applied to classify iris flowers into their respective species.
6. Accuracy Calculation and Output: The accuracy score, indicating the percentage of correctly predicted instances, is calculated using the `accuracy_score` function from scikit-learn. The result is then printed to the console.
Conclusion:
In this exploration, we've unraveled the simplicity and power of decision trees through a machine learning code snippet for iris flower classification. Decision trees provide a transparent and interpretable framework for making decisions based on input features, making them valuable tools in various domains. As you continue your journey in machine learning, experimenting with different algorithms and understanding their strengths will empower you to tackle diverse challenges in data classification, fostering a deeper understanding of the underlying patterns in your datasets.
The link to the github repo is here.


