We have more and more useful statistics about optimizations in cknowledge.org/repo as well as DNN working via CK, so it would be interesting to try to apply DNN to figure out useful features in programs, compiler IR, binaries, data sets, OS, hardware which could predict such optimizations. It can be a nice internship or a GSOC project.
See related ticket for MILEPOST GCC: ctuning/reproduce-milepost-project#2