Random work-related updates
There is life besides covid19.
We've started blogging a bit as a group on our website, which is a development I am happy about.
http://big-data-biology.org/blog/2020-04-29-NME/ : The famous Tank story of a neural network that purported to tell apart American from Soviet tanks, but just classified on the weather (or time of day) is probably an urban legend, but this one is true. What I like about this type of blogging is that this is the type of story that takes a lot of time to unravel, but does not make it to the final manuscript and gets lost. [same story as Twitter thread]. Most of the credit for this post should go to Célio.
http://big-data-biology.org/blog/2020-04-10-cryptic : A different type of blogpost: this is a work-in-progress report on some our work. We are still far away from being able to turn this story into a manuscript, but, in the meanwhile, putting it all in writing and out there may accelerate things too. Here, Amy and Célio share most of the credit.
2. We've updated the macrel preprint, which includes a few novel things. We have also submitted it to a journal now, so fingers crossed! In parallel, macrel has been updated to version 0.4, which is now the version described in the manuscript.
3. NGLess is now at version 1.1.1. This is mostly a bugfix release compared to v1.1.0.
4. We submitted two abstracts to ISMB 2020: The first one focuses on macrel, while the second one is something we should soon blog about: we have been looking into the problem of predict small ORFs (smORFs) more broadly. For this we took the dataset from (Sberro et al., 2019) which had used conservation signatures to identify real smORFs in silico and treated it as a classification problem: is is possible to based solely on the sequence (including the upstream sequence) to identify whether a smORF is conserved or not (which we are taking as a proxy for it being a functional molecule).