In machine learning, fitting a model means training it to recognize patterns in data, while predicting involves using those patterns to forecast outcomes. This guide walks through this core workflow using house price prediction as an example.
Key Concepts:
fit()
: Trains the model on data.predict()
: Generates predictions from new inputs.import pandas as pd
data = pd.read_csv('housing_data.csv')
# View structure and missing values
print(data.head())
print(data.isnull().sum())
X = data[['SquareFeet', 'Bedrooms']]
y = data['SalePrice']
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
model.fit(X, y) # Fit the model to the data
if SquareFeet > 2000 and Bedrooms == 3 → predict $420,000
new_data = [[1800, 2], [2400, 4]] # New houses to predict
predictions = model.predict(new_data)
print(predictions) # e.g., [320000, 475000]
from sklearn.metrics import mean_absolute_error
train_predictions = model.predict(X)
mae = mean_absolute_error(y, train_predictions)
print(f"Training Error: ${mae:.2f}")
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
model.fit(X_train, y_train)
val_mae = mean_absolute_error(y_val, model.predict(X_val))
print(f"Validation Error: ${val_mae:.2f}")
from sklearn.ensemble import RandomForestRegressor
improved_model = RandomForestRegressor(n_estimators=100)
improved_model.fit(X_train, y_train)
Overfitting: The model memorized noise in the training data.
Fix: Simplify the model or use cross-validation.
Fitting and predicting is the core of machine learning—train models, validate, and refine for accurate results.