Today, I added another small feature to our habitable zone model. I know that downloading the whole program is not very convenient. It also only works on Windows and only includes stars within 10 parsecs. To help get around some of these issues, as well as include an on-the-go alternative, I decided to include a social media component to the model. All you need to do to use it is tweet @habitablebot followed by the name of a star that you would like to see the habitable zone of. It will then tweet back at you a picture of the habitable zone with and without the earth included as a comparison. If you have any problems using it, or have any questions, feel free to ask below.
Since I have not been in town the past week, I have not been able to work on the main project or post on the blog as frequently. Instead, I decided to do a mini week long project that I could do in my limited time that still relates to exoplanets and their discovery.
The Problem: One of the ways to detect an exoplanet is by looking at the light curve of a star (a plot of the brightness of a star vs time). Theoretically, the light curve of a perfect, glowing sphere with no imperfections would be a flat line (the brightness would always be constant). Then if there was a planet around the star, the light curve would dip when the planet passed between the star and the Earth, because it blocks some of the light and causes the brightness to decrease. However, in reality, it is much more difficult. Stars, being huge balls of fusing hydrogen, are not the most stable objects and can have natural fluctuations in their brightness. Also, planets, being so small compared to their stars, do not cause that big of a dip in the brightness. With these two problems, it becomes difficult to determine whether fluctuations in brightness are caused by natural processes taking place in the star, or whether the fluctuations are because there is a transiting planet.
The Solution: The solution that I am currently attempting to use to help solve this problem is what is called a neural network, which is a type of machine learning modeled off how the brain and neurons work. I think the best way to understand what it does is to have a visual (see below).
So, to break the visual down, there are four main components: the input layer, the hidden layer, the output, and the connections (weights). The input layer is simply the starting data. In my case, I have 3197 brightness measurements, so instead of 3 input neurons, I have 3197. The input data is then sent through the weights (shown as the arrows in the visual that connect each node of one layer with each node of the next layer). All the weights do is simply multiply the individual input data by a number and pass it on to the next node. When the data gets to the hidden layer, each node adds up the numbers received (1 from each previous node). It then applies a function* which spits out another number. This number is then passed on to the next set of weights which have the same purpose as before. Finally, the data reaches the output layer which does the same thing as the hidden layer, except this time, the number that is produced is the final result. In my case, a 1 means that there is an exoplanet, and a 0 means that there is no planet.
The interesting part of the neural network comes in the training/’learning’. At the very beginning, the model starts with random weights. This means that if I ran a test light curve through the model, it would essentially give a random answer. The real learning part of the network comes in the weight optimization. What happens is that I give it a set of training data that contains the light curve and the correct answer for 5087 stars. It then uses this data to go back and tune the initial weights until the right answer is given for the data. In doing so, it is essentially ‘learning’ how to see a light curve and spit out whether or not there is a planet. One of the common ways to optimize these weights involves calculating an error function (difference between right answer and calculated answer) and then minimizing that error (if you are more curious about this, I would google backpropagation or I can try my best to explain it). Once the weights have been tuned, you can then give the network a brand new star that it has never seen before and it should be able to tell you whether or not there is a planet or not with fairly good accuracy.
While I would comment on the accuracy of the neural network I created, it is actually still in the training portion (optimizing and changing the weights), as this stage takes quite a bit of time, especially with large data sets like mine. For this particular problem, I have roughly 5 million weights that have to be tuned (it’s been running for 18 hours and still isn’t done). Once it does finish however, I will make another post that details whether or not it works.
Sorry for a bit of a long post, but I have not had as much time to document each step, so I had to put them all into this post. This topic is also a confusing one that people spend years understanding and learning. Because of this, I really do encourage questions, because I’m sure that I can elaborate and better explain something than I did in the quick overview that I’ve given.
*The function is a sigmoid function that outputs a continuous result from 0-1.
Today we were going to meet with Dr. Karalidi, but she was busy with some other stuff so we did not get the chance. Instead I took the opportunity to display our data in another way. What I did was plot the temperature of the star vs the inner habitable radius, outer habitable radius, and both at the same time. What these plots showed was that there is a pretty strong correlation between the temperature of the sun and the distance of the habitable zone, although there are some outliers (my guess is that they are hot stars, but are quite small, so their habitable zone is still close. This also leads me to believe that the radius of the star is more important than the temperature). By plotting the two boundaries together, I show that when the temperature of the star increases, the ratio between the inner and outer boundary stay roughly the same as I noticed a couple posts ago. I repeated the same procedure with the radius of the sun, and found similar results to those of the temperature.
Also, I finally got the program uploaded to a site that allows applications. You can download it from: https://mega.nz/#F!Fqw3QbLR!STHsQYGtyNvF_DJu6V3zrw
Today I added the last feature that I had planned for the model: displaying information about the star on the visual itself and not just in the menu. While the program is definitely not perfect, I think that it is polished enough to release for download. To run it, just download the folder from the link below and run the file named visual. It may take a couple seconds to start, but it should eventually pop up. It should work on all versions of windows, but I have not tested it on Mac OS. If you encounter any problems or have any suggestions for improvements don’t hesitate to comment, and I will do my best to fix/improve the model.
Because we have finished the model, we will be meeting with Dr. Karalidi later this week to see what our next steps should be. In the meantime, I will be trying to fix the little things that are wrong with the program (loads a little slow, scrolling doesn’t always work with laptop trackpad, etc).
Over the past few days I have mostly been working on improving the functionality of the model, as well as checking the peculiarity described in the last post. From what it looks like, the fact that the inner boundary is halfway between the star and the outer boundary appears to be an intended part of the equations and is not some mistake made during the creation of the model. Other than coming to this conclusion, I have successfully implemented a button that will overlay the orbit of the earth on top of the habitable zone for comparison (shown below). I also added labels to the star menu so that it is possible to know which row of numbers corresponds to which property of the star. I also made it so that clicking these labels will sort the corresponding category (clicking the name header will sort stars alphabetically, clicking the spectral type header will sort by spectral type,etc).
I think that there are one or two more functionality things that I would still like to add, but other than those, I am getting pretty close to finishing the model. Hopefully sometime next week I will be able to finish it and release it for download so that people can mess around with it as much as they want.
Yesterday I was able to get a menu up and running (pictured below) to make it easier to switch the visual from one star to another. The menu lists the name of the star, the spectral type, the parallax, and the B and V magnitudes. While these are not labeled yet, I am hoping to do so in the next few days. I also intend to add sorting functionality (sort by distance, brightness, spectral type, etc).
While I was looking through some of the stars , I came across something quite interesting. While the boundaries of the habitable zone were different for every star, the inner boundary was almost always about half the distance of the outer boundary. In the next few days I am going to be working on validating that this should be true and is not just a peculiarity of the model and equations that we used. In the meantime this means that when changing from star to star in the visual, it might not look like the boundaries are changing, just the axes (shown below).
Even though these stars have vastly different boundaries(0.7-1.38 AU for star 1 and 0.07-0.14 AU for star 2), the inner boundary is half the distance of the outer boundary which makes the zones look the same unless you look at the axes. Assuming that the inner boundary is supposed to have this property, I plan on adding a button that toggles the radius of the earth’s orbit. This will hopefully add some perspective to the visual so that not every star looks the exact same.
The last week has been pretty slow, but we have managed to take the habitable zone and show it visually, because just the numbers themselves mean very little to most people. We have also iterated our program through all the stars that we have data for and have successfully calculated the habitable zone for all the stars within 10 parsecs. The next step that we have to take is to add some sort of menu to our visual so that it is easier to change from star to star, because currently it is necessary to enter the numbers by hand. Below are some of the visuals that we have made so far.
This visual is a great demonstration of what I mean when I say that the numbers do not mean a whole lot by themselves. At first glance this habitable zone looks very similar to that of other stars. However when you look at the next visual which shows the zone in comparison to the orbit of Earth, you can see that the zone is actually quite small, which is a bit difficult to immediately understand from the first visual. Because of this, our visual will usually include the orbit of the Earth so that there is something to reference that people will have a bit more experience with.
Today I finished the second model that I mentioned in the last post. After doing some tests on a couple of stars, it looks like this one is much better than the last one(and required much less math,time,graphs,pretty much everything). The approach we took to this one was to take the aforementioned B and V band magnitudes. We then took the difference in these(called the color index) and used this number to calculate the temperature of the star. We then used just the V magnitude, the distance of the star, and whats called a bolometric correction constant(pretty much just a constant to convert from the visual range to the whole range) to calculate the luminosity of the star. We then used an equation published in a paper that uses the temperature and luminosity that we just calculated to find the inner and outer edges of the habitable zone. The reason I think that this model was much better than the last is mostly because of the fact that the equation from the paper takes into account the greenhouse effect which has a large effect on the boundaries. Also because this model only uses the B and V magnitudes, it will be much easier to use than the model that requires the data set to be complete. I will update in a few days once we run all of the stars through the model and get to see how well it works on the whole set of them.
So today we finished the first model. What we ended up doing to fix the problem from the last post was to use Stefan-Boltzmann law to relate temperature and flux. What we assumed was that the inner boundary was when the temperature was at 373 K (boiling point of water) and that the outer edge was at 273 K (freezing point of water). What we also did was assume that a planet would be similar to Earth and would reflect 30% of the flux that it receives(the planet’s albedo). Using this reflection, the temperature assumptions, and the Stefan-Boltzmann law we were then able to define the boundaries of the habitable zone. After doing that, we could use the calculations we were doing from before, and this time we got very good results for the inner boundary and alright results for the outer boundary(compared to other predictions of the zone). The reason I think the outer boundary results were somewhat poor was because our model is pretty simplistic and only takes into account the solar flux, but in reality an entire planet would do a better job at conserving heat, and thus could still have liquid water farther out than our model predicts. We are going to make another model soon that we will compare to this one, but hopefully the new model will be just as accurate if not more, but require less data, because currently, not all of the stars have the required data for the first model.
So today I finished up the method that I was describing in the last post. What I had initially started to do was to approximate the curves to get an equation that I could then integrate. Instead what I realized would be easier and probably more accurate was to just use a trapezoidal approximation since I was given lots of data points. After getting the area under each curve, I then multiplied it by the flux density for the respective filter and added them up. I then converted the flux here at the Earth to the flux near the star. I ended up getting some mixed results. On the one hand it gave a very good approximation for the inner boundary for the star Proxima Centauri(.025 Au compared to other estimates of .023), but it does not give good estimates for the outer boundary for Proxima Centauri or another star Tau Centi.
I have many theories as to why the estimation is so poor, but I mainly think that it is a combination of the fact that the current estimates of the habitable zones for these stars were probably calculated with a different(most likely more complex) method and that our model seems to be very sensitive to small changes in data which would mean small errors in magnitudes measured by the telescopes could have a large impact. The other thing that it may be is the flux requirement I found for the outer boundary might be wrong. When I used the current estimate of the habitable zones I found the flux to be (290 and 410 W/m^2) at the outer edge of the zone. The fact that these numbers are so close might indicate that the 960 W/m^2 boundary I initially found might be incorrect and that it is in fact much lower. This has become the goal for tomorrow: to check more sources about how much flux is needed for the inner and outer boundary of the habitable zone.