Physical Dimensionality and Semantic Backpropagation
One of the most innovative features of GeneExpressionProgramming.jl is its support for physical dimensionality constraints through semantic backpropagation. This feature ensures that evolved expressions respect physical units and dimensional homogeneity, making it particularly valuable for scientific and engineering applications where physical laws must be respected.
Understanding Physical Dimensionality
Dimensional Analysis Fundamentals
Physical dimensionality is based on the principle that any valid physical equation must be dimensionally homogeneous - all terms must have the same physical dimensions. The fundamental dimensions in the International System of Units (SI) are:
- Mass (M): kilogram [kg]
- Length (L): meter [m]
- Time (T): second [s]
- Temperature (K): kelvin [K]
- Electric Current (I): ampere [A]
- Amount of Substance (N): mole [mol]
- Luminous Intensity (J): candela [cd]
Any physical quantity can be expressed as a combination of these fundamental dimensions. For example:
- Velocity: [L T⁻¹]
- Force: [M L T⁻²]
- Energy: [M L² T⁻²]
- Electric Charge: [A T]
Dimensional Representation in GeneExpressionProgramming.jl
Dimensions are represented as vectors where each component corresponds to a fundamental unit:
# Dimension vector: [M, L, T, K, I, N, J]
velocity_dim = [0, 1, -1, 0, 0, 0, 0] # [L T⁻¹]
force_dim = [1, 1, -2, 0, 0, 0, 0] # [M L T⁻²]
energy_dim = [1, 2, -2, 0, 0, 0, 0] # [M L² T⁻²]
charge_dim = [0, 0, 0, 1, 1, 0, 0] # [A T]
Complete Physical Dimensionality Example
This example demonstrates how to use physical dimensionality constraints to discover the relationship for electric current density in superconductivity, based on the Feynman Lectures equation J = -ρqmA (Feynman III 21.20).
using GeneExpressionProgramming
using Random
using CSV
using DataFrames
using Statistics
using Plots
# Set random seed for reproducibility
Random.seed!(42)
println("=== Physical Dimensionality Example ===")
println("Discovering: J = -ρqA/m (Feynman III 21.20)")
println("J: electric current density, ρ: charge density, q: electric charge")
println("m: mass, A: magnetic vector potential")
println()
# Define target dimension for electric current density J
# J has dimensions of [A m⁻²] = [0, -2, 0, 0,1, 0, 0]
target_dim = Float16[0, -2, 0, 0, 1, 0, 0] # Current density: A/m²
println("Target dimension (J): [M, L, T, K, I, N, J] = $target_dim")
println("Physical meaning: Ampere per square meter [A m⁻²]")
println()
# Define dimensions for input features
# Based on the Feynman equation variables
feature_dims = Dict{Symbol,Vector{Float16}}(
:x1 => Float16[0, -3, 1, 0, 1, 0, 0], # ρ (charge density)
:x2 => Float16[0, 0, 1, 0, 1, 0, 0], # q (electric charge)
:x3 => Float16[1, 1, -2, 0, -1, 0, 0], # A (magnetic vector potential)
:x4 => Float16[1, 0, 0, 0, 0, 0, 0], # m (mass)
)
println("Feature dimensions:")
for (var, dim) in feature_dims
println(" $var: $dim")
end
println()
# Physical interpretation of features
interpretations = Dict(
:x1 => "ρ (charge density) [A s m⁻³]",
:x2 => "q (electric charge) [A s]",
:x3 => "A (magnetic vector potential) [kg m s⁻² A⁻¹]",
:x4 => "m (mass) [kg]"
)
println("Physical interpretation:")
for (var, interp) in interpretations
println(" $var: $interp")
end
println()
# Generate synthetic data based on the known relationship
# J = -ρ * q * A / m
n_samples = 500
println("Generating synthetic data...")
# J = -ρ * q * A / m
n_samples = 5000
# Load the data from the file => rho_c_0,q,A_vec,m,target
data = Matrix(CSV.read("./paper/srsd/feynman-III.21.20\$0.txt", DataFrame))
num_cols = size(data, 2)
x_train, y_train, x_test, y_test = train_test_split(data[:, 1:num_cols-1], data[:, num_cols]; consider=4)
println("Training samples: $(length(y_train))")
println("Test samples: $(length(y_test))")
println()
# Evolution parameters
epochs = 1000
population_size = 1000
num_features = num_cols - 1
# Create regressor with dimensional constraints
regressor = GepRegressor(
num_features;
considered_dimensions=feature_dims,
max_permutations_lib=10000, # Increase for more complex expressions
rounds=7 # Tree depth for dimensional checking
)
println("Regressor configuration:")
println(" Features: $num_features")
println(" Dimensional constraints: enabled")
println(" Max permutations: 10000")
println(" Tree depth (rounds): 7")
println()
println("Starting dimensionally-constrained evolution...")
start_time = time()
# Fit with dimensional constraints
fit!(regressor, epochs, population_size, x_train', y_train;
x_test=x_test', y_test=y_test,
loss_fun="mse",
target_dimension=target_dim)
training_time = time() - start_time
println("Training completed in $(round(training_time, digits=2)) seconds")
println()
# Analyze results
println("=== Results Analysis ===")
# Get the best model
best_model = regressor.best_models_[1]
best_expression = best_model.compiled_function
best_fitness = best_model.fitness
println("Best evolved expression:")
println(" $best_expression")
println(" Fitness (MSE): $best_fitness")
println()
# Make predictions
train_predictions = regressor(x_train')
test_predictions = regressor(x_test')
# Calculate comprehensive metrics
function calculate_detailed_metrics(y_true, y_pred, dataset_name)
mse = mean((y_true .- y_pred).^2)
mae = mean(abs.(y_true .- y_pred))
rmse = sqrt(mse)
# R² score
ss_res = sum((y_true .- y_pred).^2)
ss_tot = sum((y_true .- mean(y_true)).^2)
r2 = 1 - ss_res / ss_tot
# Mean Absolute Percentage Error
mape = mean(abs.((y_true .- y_pred) ./ y_true)) * 100
# Maximum error
max_error = maximum(abs.(y_true .- y_pred))
println("$dataset_name Performance:")
println(" MSE: $(round(mse, sigdigits=6))")
println(" MAE: $(round(mae, sigdigits=6))")
println(" RMSE: $(round(rmse, sigdigits=6))")
println(" R²: $(round(r2, digits=6))")
println(" MAPE: $(round(mape, digits=2))%")
println(" Max Error: $(round(max_error, sigdigits=6))")
println()
return (mse=mse, mae=mae, rmse=rmse, r2=r2, mape=mape, max_error=max_error)
end
train_metrics = calculate_detailed_metrics(y_train, train_predictions, "Training")
test_metrics = calculate_detailed_metrics(y_test, test_predictions, "Test")
# Visualization
println("Creating visualizations...")
# 1. Prediction accuracy plot
p1 = scatter(y_train, train_predictions,
xlabel="Actual Current Density",
ylabel="Predicted Current Density",
title="Training Set: Actual vs Predicted",
alpha=0.6,
color=:blue,
label="Training Data",
markersize=3)
scatter!(p1, y_test, test_predictions,
alpha=0.6,
color=:red,
label="Test Data",
markersize=3)
# Perfect prediction line
all_y = vcat(y_train, y_test)
all_pred = vcat(train_predictions, test_predictions)
min_val = min(minimum(all_y), minimum(all_pred))
max_val = max(maximum(all_y), maximum(all_pred))
plot!(p1, [min_val, max_val], [min_val, max_val],
color=:black, linestyle=:dash, linewidth=2, label="Perfect Prediction")
# 2. Residual analysis
train_residuals = y_train .- train_predictions
test_residuals = y_test .- test_predictions
p2 = scatter(train_predictions, train_residuals,
xlabel="Predicted Values",
ylabel="Residuals",
title="Residual Analysis",
alpha=0.6,
color=:blue,
label="Training",
markersize=3)
scatter!(p2, test_predictions, test_residuals,
alpha=0.6,
color=:red,
label="Test",
markersize=3)
hline!(p2, [0], color=:black, linestyle=:dash, linewidth=2, label="Zero Line")
# 3. Feature importance analysis (correlation with target)
feature_names = ["ρ", "q", "A", "m"]
correlations = [cor(x_train[:, i], y_train) for i in 1:4]
p3 = bar(feature_names, abs.(correlations),
xlabel="Features",
ylabel="Absolute Correlation with Target",
title="Feature Importance",
color=:lightblue,
legend=false)
# Combine plots
final_plot = plot(p1, p2, p3, layout=(3,1), size=(1000, 800))
savefig(final_plot, "physical_dimensionality_analysis.png")
println("Analysis plot saved as 'physical_dimensionality_analysis.png'")
# 5. Detailed comparison with true relationship
println("=== Detailed Comparison with True Relationship ===")
# Calculate predictions using the true relationship
true_predictions_train = -x_train[:, 1] .* x_train[:, 2] .* x_train[:, 3] ./ x_train[:, 4]
true_predictions_test = -x_test[:, 1] .* x_test[:, 2] .* x_test[:, 3] ./ x_test[:, 4]
# Compare evolved vs true relationship
comparison_plot = plot(layout=(1,2), size=(1000, 400))
# Training comparison
scatter!(comparison_plot[1], true_predictions_train, train_predictions,
xlabel="True Relationship Predictions",
ylabel="Evolved Expression Predictions",
title="Training: Evolved vs True Relationship",
alpha=0.6,
color=:blue,
legend=false,
markersize=3)
# Test comparison
scatter!(comparison_plot[2], true_predictions_test, test_predictions,
xlabel="True Relationship Predictions",
ylabel="Evolved Expression Predictions",
title="Test: Evolved vs True Relationship",
alpha=0.6,
color=:red,
legend=false,
markersize=3)
# Perfect agreement lines
for i in 1:2
min_val = i == 1 ? minimum(true_predictions_train) : minimum(true_predictions_test)
max_val = i == 1 ? maximum(true_predictions_train) : maximum(true_predictions_test)
plot!(comparison_plot[i], [min_val, max_val], [min_val, max_val],
color=:black, linestyle=:dash, linewidth=2)
end
savefig(comparison_plot, "evolved_vs_true_comparison.png")
println("Evolved vs true relationship comparison saved as 'evolved_vs_true_comparison.png'")
# Calculate agreement metrics
train_agreement = cor(true_predictions_train, train_predictions)
test_agreement = cor(true_predictions_test, test_predictions)
println("Agreement with true relationship:")
println(" Training correlation: $(round(train_agreement, digits=6))")
println(" Test correlation: $(round(test_agreement, digits=6))")
println()
Real-World Applications
1. Fluid Dynamics
# Discovering relationships in fluid mechanics
# Example: Pressure drop in pipe flow
feature_dims_fluid = Dict{Symbol,Vector{Float16}}(
:x1 => Float16[0, 0, 0, 0, 0, 0, 0], # Friction Factor
:x2 => Float16[1, -3, 0, 0, 0, 0, 0], # ρ (density)
:x3 => Float16[0, 1, -1, 0, 0, 0, 0], # v (velocity)
:x4 => Float16[0, 1, 0, 0, 0, 0, 0], # L (length)
:x5 => Float16[0, 1, 0, 0, 0, 0, 0] # D (diameter)
)
# Target: Pressure [ kg m⁻¹ s⁻²]
target_dim_fluid = Float16[1, -1, -2, 0, 0, 0, 0] # ΔP (pressure drop)
2. Heat Transfer
# Discovering heat transfer
# Example: Heat flux
feature_dims_heat_flux = Dict{Symbol,Vector{Float16}}(
:x1 => Float16[1, 0, -3, 0, -1, 0, 0], # h (heat transfer coefficient)
:x2 => Float16[0, 0, 0, 0, 1, 0, 0], # ΔT (temperature difference)
)
# Target: Heat flux q = h⋅ΔT [W m⁻²]
target_dim_heat_flux = Float16[1, 0, -3, 0, 0, 0, 0]
3. Electromagnetic Theory
# Discovering electromagnetic relationships
# Example: Electromagnetic wave propagation
feature_dims_em = Dict{Symbol,Vector{Float16}}(
:x1 => Float16[0, 1, -1, 0, 0, 0, 0], # c (speed of light)
:x2 => Float16[0, 0, -1, 0, 0, 0, 0], # f (frequency)
:x3 => Float16[0, 1, 0, 0, 0, 0, 0], # λ (wavelength)
:x4 => Float16[-1, -3, 4, 0, 2, 0, 0], # ε₀ (permittivity)
:x5 => Float16[1, 1, -2, 0, -2, 0, 0], # μ₀ (permeability)
)
# Target: Speed [m s⁻¹]
target_dim_em = Float16[0, 1, -1, 0, 0, 0, 0]
Benefits of Dimensional Constraints
1. Physical Plausibility
- Ensures evolved expressions make physical sense
- Prevents dimensionally inconsistent solutions
- Maintains scientific validity
2. Search Space Reduction
- Eliminates large portions of invalid solution space
- Focuses search on physically meaningful expressions
- Improves convergence efficiency
3. Interpretability
- Results have clear physical meaning
- Easier to validate against known physics
- Facilitates scientific understanding
4. Robustness
- Reduces overfitting to specific datasets
- Improves generalization to new conditions
- Increases confidence in extrapolation
Common Challenges and Solutions
Challenge 1: Complex Dimensional Relationships
Problem: Some physical relationships involve complex dimensional dependencies Solution: Use hierarchical approach, break into simpler sub-problems
Challenge 2: Dimensionless Numbers
Problem: Important physics often involves dimensionless groups Solution: Include dimensionless combinations as derived features
Challenge 3: Unit System Consistency
Problem: Mixed unit systems can cause confusion Solution: Standardize on SI units throughout analysis
Challenge 4: Computational Overhead
Problem: Dimensional checking adds computational cost Solution: Optimize dimensional propagation algorithms, use caching
Physical dimensionality constraints in GeneExpressionProgramming.jl provide a powerful tool for discovering physically meaningful relationships while ensuring scientific validity. This feature makes the package particularly valuable for scientific and engineering applications where physical laws must be respected.
Next: Tensor Regression