I recently wrote a post of quotes from Albert Einstein on my blog.  As I was researching, my curiosity was piqued regarding the Theory of Everything and how Data Science might be able to be applied to take a step to advance it.  I thought to myself that this would make an interesting thought exercise! 

These quotes from Einstein particularly caught my attention: 

  • “Quantum mechanics is certainly interesting. But an inner voice tells me that it is not yet the real thing.  The theory says a lot, but does not really bring us any closer to the secret of the “old one”. I, at any rate, am convinced that He is not playing at dice” 
  • “I cannot seriously believe in it [quantum theory] because the theory cannot be reconciled with the idea that physics should represent a reality in time and space, free from spooky actions at a distance” 
  • “People like us, who believe in physics, know that the distinction between past, present and future is only a stubbornly persistent illusion” 
  • I believe in intuitions and inspirations. I sometimes feel that I am right. I do not know that I am.   

And then I thought, maybe some principles from Data Science can help.  Here are some Data Science constructs that will help 

  • Logistic Regression 
  • Correlation Matrix 
  • Logarithmic Transformation 
  • Linear Regression 

We start with our equations E = mc2 and E = hf (which is also E = hc/λ).  We know E = mc2 is in regards to large masses while E = hc/λ relates to infinitesimally small particles and waves.  This is a perfect case for Logistic Regression because these two equations are discrete classifications.  It is binary – either it is 1 or 0.  They also represent boundary conditions, so we can set each one to zero to solve them.  For the E = mc2 part of the equation, this stands alone and is correct as-is – for with large masses, we will have a value; for infinitesimally small masses this will go to zero.  For the E= hc/λ part of the equation, for infinitesimally small masses and waves, we will have a value; however, for large masses, we will have zero for a wave length so this will create a division by zero problem, so we will need a coefficient based on mass along with hf so that it can go to zero for large masses.  We will also have to apply a Logarithmic Transformation in order to analyze the data since it is skewed by magnitudes of size.  We can use a Correlation Matrix to test out different mathematical construct candidates and then finally, use a linear regression for the best candidate. 

After some time exploring candidates, I came upon a promising candidate for the coefficient:  1/ ( (1-Log(mc2))^2).  I used  a small set of data in which I varied the mass from 1E-17 to 1E +14.  I plotted the logarithmic transformation of the coefficient in Tableau with R-squared of 0.85 and P value of 0.0004, which I think is pretty good on my hunt for a needle in the haystack.