Gaussian Noises: 2008

Friday, November 7, 2008

On Learning When To Shut Up

You know what phrase I'm beginning to dread? It's "Oh, so you're saying that...". Nine times out of ten, I'm not actually saying that. Or if I was saying that, I was merely speculating.

Example:

Me: This is not giving the result I expected. I observed effect Y, so maybe it's due to Z.
Person X: Oh, so you're saying that [slight rephrasing of Z] is causing the problem.

No, I'm not stating an absolute, as indicated by my use of the word "maybe". I'm putting forward a hypothesis, which is what you do in science. But I don't appreciate you nailing me down on that hypothesis before I have even investigated it.

Maybe (just a hypothesis!) the problem is with me. These are (for the most part) busy people I'm talking to, and maybe they can't afford to spend that much time speculating anymore. So when I'm coming up with a hypothesis, they just assume I wouldn't be telling them if I hadn't already thought about it for some time.

In other words, do I need to learn when to shut up?

Monday, October 13, 2008

A Plea

Please, please, please, please... document your data.

It's not fun to get a matrix without any column or row labels. Like my old physics teacher used to say when somebody proudly proclaimed that the answer to an exercise was 5.029, "5.029 what? Elephants?". In physics, a quantity is not meaningful without a unit, and in data management, a matrix is not useful without labels.

It's not fun to know nothing about any preprocessing of the data. Has it been normalised? I guess I can check the mean and standard deviation, but what if it's only close to 0 and 1, but not exactly so? Was that some special normalisation method? Hell if I know.

It's not fun to know nothing about the experimental setup. Maybe you told me every second data point is a wildtype. Does that mean that these were results from two-colour arrays? Or two singe-colour arrays? Are the wildtypes from the same time point as the mutants? Come to think of it, are these even time-course data?

It's not fun to know nothing about what the biologists* want you to find. Are they looking for similarly expressed genes or for regulators? Would they rather have a network of the knocked-out genes or of all genes? Is it worse if I give them false positives or false negatives?

So please, please, please, document your data. Maybe some time in the future I will tell you about fun things called wikis and databases, but for now even a text file would do.

*Or insert other applied science here.

Monday, September 29, 2008

The View from the Foothills

I'm finally there. Five years of hard work, first as an undergraduate and then as a Masters student, have paid off. Last Monday, I was granted my rightful place in academia... on the lowest rung of the ladder.

Yes, PhD students are a dime a dozen, even in my small institute, and despite the excitement of starting research in earnest, I can't help but feel slightly apprehensive. This may just be the result of reading too many PhD comics, but a tinge of anxiety is setting in. What if my advisor turns out to be a workaholic? What if I can't finish in the required 3 1/2 years before my funding runs out? What if my office mates are insane (they're not, I think) or my experiments all fail?

Then I remember that a million students have survived their PhD just fine before me and a million will again. I may not have the prettiest office (in fact, drab is not an inaccurate description), but at least I'm not sharing with 14 other people like my flatmate. My supervisor has only been nice to me, despite the bollocking that he gave his other PhD student last week. And my project, even though it looks daunting from here, will rest on the foundations of my Masters project, meaning that I have a reasonable idea of where to start.

So, the base camp has been established in the foothills of Mt. PhD. Only the future will show if I scale the summit triumphantly, or freeze to death in a crevice somewhere. Boy, that metaphor took a bleak turn, didn't it?

Saturday, August 30, 2008

Research is Easier if You Make It Up

Yes, I know I haven't written in a good few months. In my defence, I have been kept quite busy by the research for my MSc project. Now that it is done, however, I'd like to share a few thoughts on my first real experience with research.

For this first post, I want to talk about what was perhaps the most humbling experience, and that was how tempting it was to cheat.

Like many research projects, my research was beset with problems. There were contradictory results, vague results, results that were the opposite of what we expected, without any indication why this happened. And often, when I got these results, I would think: "Gee, wouldn't it be nice if I could make up the results I wanted instead."

Now before you cast the first stone, let me be very clear: I did not fake any results, nor will I hopefully ever do so. But it got me thinking. How easy would it really be to fake results? For my MSc project, it would have been really easy. We do not have to hand in the code (although it is possible that the markers may ask for the code if they smell a rat, but let's assume for the sake of the argument that the faked results are completely convincing), so I would not even have to write the programs. I knew how the different experiments were supposed to work, so generating some convincing results would have been easy. The only people to see the results are my supervisor and a second marker. Of those two, only my supervisor could possibly spot fake results, because the second marker is not an expert in the field. If I had gotten any fake results past my supervisor, I would basically be home free.

You might be thinking that that's all very well for a Masters project, but surely in real research faked data would be spotted. But would it really? I agree that you would probably have difficulty faking a whole project: You'd be hard-pressed to answer questions from reviewers of your paper, and anyone trying to repeat the experiments would obviously get very different results. But what about just tweaking that one experiment that's poking a hole in your theory? That would again be very easy and would probably not be spotted unless somebody decides to repeat that exact experiment. If somebody later disproves your theory, well, you got a paper out of it, and nobody can really blame you for not spotting the flaw when all of your experiments were confirming the hypothesis.

Cheating can get even more subtle (choosing your experiments, skimping on controls, omitting results) and harder to spot. So the question is, given how easy it would be to cheat, what, other than personal integrity, is keeping scientists honest?

I believe curiosity and ambition are big factors. If you get results that contradict your hypothesis, you don't just say "Aw, crud", you get excited, because there's another problem to solve. Maybe this new problem will lead to an even bigger discovery than the one you were hoping to make. If you just fake the result, you'll probably never do really ground-breaking science. Worse yet, you might set back other scientists who will not pursue their theories because your "results" seem to have disproved them.

There's also training and your research environment. Never underestimate social conformity, which in this case is a good thing. If everybody around you is excited about research, as most scientists will be, you'll find it very hard to be the cheater, even if you're the only one who knows that your results weren't real. You'll want to be just as good as the rest, and if they can deal with contradictory results, then so can you.

Of course, this only applies if people the people in your research environment let you know about the problems they were having. They may be competitive people who feel that talking about struggling with research is equivalent to showing weakness. If that is the case, I recommend reading some of the many excellent blogs from scientists who are not afraid to talk about their research issues.

One thing that is clear is that you cannot just assume that every result that is published is automatically set in stone. If you think you have a better theory, test it, and if necessary repeat an experiment that has already been done. If enough people do that there might actually be a chance of demasking the cheaters. And that would be another great incentive not to cheat in the first place.

Saturday, July 5, 2008

Conference Noises

Would you say that a scientists first conference is like his first kiss*, a unique experience, never forgotten despite the fumbling and nervousness? Or is it more like the first time you went to a McDonald's: Sure, it's exciting and colourful, but after you've been a few dozen times you notice that they're all the same.

I couldn't say yet which of these is a better description, since I've only just experienced my first conference. Conference might be saying a bit much: It was a one-day symposium, and I didn't even have to leave the city.

Still, there were some memorable experiences to be made. Some were of the mundane variety: It seems that even in Britain, coffee break means coffee break, and not tea break. And don't even dare ask for water. Also, pinning your badge to your shirt is a fashion faux-pas; the correct place is discreetly on your belt.

The poster session was different from what I expected, because there were really only posters. Somehow, I always expected the poster creators to be standing next to them with proud smiles, eager to explain their science to anyone passing by. Not so here: There were posters, there were people reading the posters, and that was it.

The talks ranged from the fascinating to the mystifying. I've always been better at learning things from papers than at picking them up in lectures, so it's no surprise that I couldn't follow some of the more complicated topics. Listing to those lectures was not a waste of time, though, since at least now I know those topics exist and I can find out more about them (by reading papers!) if I want to.

The quality of the speakers varied (doesn't it always?) but some of them were very good, even inspirational. There are so many unsolved problems in bioinformatics, but these speakers were pointing the way to solving many of them.

Now for the more disappointing part of the symposium. No, not the food, that was alright. This is something that I'm willing to be not many attendees even noticed, but it's actually a huge statistical fluke if it was random: Out of 15 speakers, not a single one was female. I'm used to gender bias in my field, especially on the informatics side, but 0 out of 15? Seriously? You're telling me that there's not a single female professor that you could have invited to talk about her research?

At least many of the people in the audience were female, but jeez!

*With the first conference occasionally preceding the first kiss by a while.

Sunday, June 1, 2008

I Can't Remember Where I Found This

...but it's really neat. A science motivational poster:

Thursday, May 22, 2008

Fun with Proteins

I think almost anyone who has studied protein 3D structures would agree that it is a hard problem.

For the uninitiated, proteins are made out of chains of amino acids. Each amino-acid consists of a backbone and side-chains. The properties of the side-chains determine the structure that the chain will take on in 3D. For example, polar side-chains may repell each other, and hence tend not to be close. Another example is hydrophobic ("water-fearing") side-chains which need to be on the inside of the protein, away from the water molecules that surround a protein inside the cell.

There's more than one way to determine protein 3D structure. You can take the actual protein, crystallise it, shoot x-rays at it and work out the structure from seeing how the x-rays diverge. Or you can take all of the contraints mentioned above, encode them in a computational model and get a computer to crunch the numbers for you until it finds the optimal structure.

Or, well, you could just get people to do it by hand. For free. And have fun while they're doing it.

That's the principle behind FoldIt, a new game based on, yes, you guessed it, protein structures. The idea is that protein folding is much like a puzzle, and people love doing puzzles. So we let them fold virtual proteins, and evaluate the structures based on the constraints that we know about. Add some fun sounds when you're tugging and dragging proteins, a bonus for reducing the number of moves starting from the initial configuration, and an element of competitiveness in the form of an online ladder, and you've got a fun little game that people will actually want to play.

FoldIt is currently in open Beta and completely free to play. I've tried it out, and it really is a lot of fun. The online option allows you to chat with fellow folders while you're playing, and the interface is simple and intuitive. You don't even have to know anything about proteins; there's a very easy tutorial to get you up to speed, and you'll figure out soon enough what works and what doesn't.

It will be interesting to see if FoldIt players can come up with better protein foldings than a computer could. FoldIt is not the first instance of a "useful" game I've come across. I first heard about the concept, called human computation, in the context of a game called ESP that gets its users to label images with text. It makes you wonder what other arduous bioinformatics tasks we could turn into games (gene-finding anyone?).

Friday, April 18, 2008

A Cautionary Tale

Remember the asteroid that's coming close to hitting the earth in 2036? Most of us probably heard about it at some point or other, made a quick calculation to see if we would be alive then, and forgot about it. If you looked into it a bit further, you found out that NASA only gave it a 1 in 45000 chance of hitting the earth; not nearly enough to worry about.

But then, shock! horror! what if NASA were wrong? People make mistakes. It's not as if they double-checked these results.* And it's not as if the smartest minds of the planet were working for NASA.** Who could we rely on to find out if there are any problems with NASAs calculations? Oh, I know. Let's ask a 13-year old schoolboy. It's good enough for The Times. And what do you know, the whiz kid places the risk at 1 in 450. Definitely in the "we should worry about this" category.

Except NASA never confirmed it, as stated in that article, and has in fact since denied that the boy's calculations are correct. Mark over at Good Math, Bad Math has a simple explanation why his assumptions are flawed.

I don't mean to put down Nico Marquardt, the boy who made the calculations. He obviously put a lot of effort into this, and it must have been pretty good work to convince so many people. Who I do want to put down is the many many papers who simply picked up this story, based on its "newsworthiness", seemingly with no fact checking at all. One phone call to NASA would have cleared it all up. Is this all we can expect from Old Media?

* They did.
** They are.

Saturday, March 29, 2008

Open Reviews?

The PLoS Computational Biology journal has an editorial about open access biology papers. Most of it is the usual Web 2.0 aspirations: We need more structured data, more semantics, more killer apps taking advantage of it.

Not that I disagree, necessarily, but I've heard it all so often that it's not really registering anymore. Maybe the killer apps will come along, and maybe they won't. Web 2.0 thinking has potential for science publications, but not everything that has potential gets realised.

But towards the end of the article, there was something that caught my attention: Open review.

Certainly rating a paper would seem reasonable when done by the Faculty of 1000 (http://www.f1000biology.com), but it is not a generally accepted practice. We challenge you to rate this Editorial too. In some ways the reluctance to rate a scientific paper is strange since we suspect the same person may well rate a book on amazon.com. Another option would be to add a Digg or del.icio.us button (http://digg.com or http://del.icio.us) to incorporate conventional media ranking tools into an academic journal Web site. If one finds an interesting article, one could immediately flag it with these tools.

Now this is interesting, because peer-review is a nearly sacred notion in science. Your paper has not proven merit until it has passed the peer-reviewing process and been published. It basically says, "other scientists thought this was worth reading".

So what happens if the reviewing suddenly become open to all? Well, it essentially become a popularity contest. Digg is a perfect example: The stories that end up on the front page are the ones that a lot of people liked. Very democratic, isn't it? Only it means that today the stories included "Pranks to pull on your Co-workers" and "The 10 Most Mismatched Movie Couples".

Oh, there were plenty of interesting stories as well, but my point is that what's popular is not necessarily what's best. By all means, let people comment and review papers, but make sure we know which reviews come from scientists, and which come from your average schmoe. I know it sounds elitists, but it's not much more elitist than demanding that the person who takes out your appendix have a medical degree.

Friday, March 21, 2008

Some Reading for your Easter Weekend

I can't seem to find any interesting science stories today. (Maybe everybody's taking an extended weekend off?) So instead, I decided that it was time to round up some of the blogs that I read, but don't cite here so often.

FemaleScienceProfessor

Reading FSP makes me wish that I'd had somebody like her as a lecturer in my undergraduate years, rather than a series of boring white guys. (Not you Dr. S! Nor you Professor W!) She clearly cares about her students and loves her job, always a winning combination. Plus, I find her stories about careless misogyny in academia endlessly fascinating (in a horrifying, hope-I-never-act-like-that way).

Bioinformatics Zen

Not updated very often (kind of like this blog, huh?), but when it is, the articles are always worth reading if you're interested in the nitty-gritty of bioinformatics. Anything related to the field can come up here, whether it's the intricacies of programming, tips on how to get a PhD or humorous characterisations of stereotypical bioinformatics people.

Minor Revisions

This blog is a more recent addition to my reading schedule, but a charming one. Katie gives us a glimpse into the life of a biomedical engineering postgrad, and a very personal glimpse at that. I'm always impressed with people who are willing to share their ups and downs on their blog; it's a (skill? trait? strength?) I don't seem to possess. Katie likes having lots of subscribers to her feed, so go subscribe!

Saturday, March 15, 2008

Talking the Talk

And here's another post that I'm stealing from Of Two Minds: How to Give a Bad Science Presentation.

Of course, the advice they give applies to presentations given to fellow scientists, with the objective of introducing your work to them. And in that particular scenario, I probably agree with everything they say.

However, what if the aim of the presentation is not to inform, but to educate? In other words, what if you're giving a lecture? This is very topical for me, as I've just finished a course where students were giving presentations on papers, and I've had to do one of the presentations myself. We disregarded most of the rule they came up with. Were we right to do so? Well, let's look at the rules:

- Be able to give the presentation without support of the slides.

That one's a tricky one, because we were explaining a technique. In my part, I was heavily relying on examples to explain what was happening, and those examples were all on the slides. Could I have done it on the blackboard? Probably, but not without taking considerably more time. Still, we did rehearse a few times, so I think we could have brought the point across even without the slides. Overall, this rule holds.

- No outlines on the slides

Now this I can't completely agree with. Sure, giving an outline is slightly superfluous when you're repeating what it says on the slides. But if you're trying to get an unfamiliar topic across to an audience, reinforcement helps. During the presentations by other groups, I often found myself referring back to the slides when I hadn't caught what they were saying. I think outlines have their place in lecture slides.

- The less text the better

Two problems with this one: The first is the point that I just raised that it helps to refer back to the slide if you missed or were confused by what the speaker was saying. The second is that sometimes, the slides are made available to the audience as a study help before or after the talk. They effectively double as lecture notes, and so it is helpful if they contain enough detail so that you can understand them without the help of the speaker.

On the other hand, too much text can indeed be distracting during the presentation. So I'd advocate a compromise solution here: Keep the slides sparse, but provide detailed lecture notes at the end. Unless you're confident that your speaking ability is good enough to allow your audience to follow along easily and take notes while they do.

- Let us see the data

No argument there: Figures should be clear and big enough so that the audience can get a sense of what it is you're trying to show.

Monday, March 3, 2008

And Now For Something Completely Silly

For his inaugural post at the new Of Two Minds blog, Steve Higgins chose one the most important scientific issues of our times: Could Superman's x-ray vision really exist?

There are three basic conditions that a superman x-ray system must meet to be plausible.

1. Transparency:
The rays must be such that all objects but lead are entirely or almost entirely transparent to them. Lead is always entirely opaque to the rays.

2. Color:
The rays and processor must result in Superman perceiving the same colors as would an Earthling viewing the scene in ordinary sunlight.

3. Exclusivity:
The rays must permit Superman, but not Earthling standing in line with the reflected rays, to see through normally opaque surfaces.

Steve wrote his article tongue-in-cheek, of course, but he bases it on a very real article that appeared in 1985 (!) in the journal Perception: "On the plausibility of superman's x-ray vision" by J.B. Pittenger.

And they wonder why everybody thinks scientists are a bunch of nerds...

Saturday, February 23, 2008

History in our Genes

Via the Wired Science Blog, I found this story about a study of variations in the human genome among different populations. While this has been done before, the new study shows how much data we can get just by looking at people's genes.

"The novel finding is the depth of the resolution we've gone to," said National Institutes of Health neurogeneticist Andrew Singleton, co-author of one of the papers in Nature. "This really lets you start moving towards locating individuals geographically. Previously, we've been able to look at the genome and say, 'This part is from Africa, this is from Asia. Now we can look past that and say, 'It's from this part of Africa or Eurasia.'"

Continued Singleton, "We can use these data to look at other areas of the genome that might have been under particular pressure for survival, and go from there to figuring out what the pressure is. One area that was highlighted was the genes responsible for digesting lactose. In countries where there's milk consumption, that one particular haplotype that allows more efficient lactose digestion has arisen."

They've not only been able to identify populations based on the genome alone, but they've also managed to model how humanity spread around the globe. Our history is encoded in our genes. How cool is that?

And all this was done using only the genomes of about a thousand people. Imagine what will be possible once we have even more data. And on the biology side, we will be able to repeat the same analysis for animals or plants. These are truly exciting times.

The full article that appeared in Science can be read here.

Thursday, February 14, 2008

Evolving Evolution

Okay, that title was too easy a pun. I apologise. But Wired blog had a couple of posts about where evolution (as in, "the theory of...") may be headed. The name for this extension of evolution is complexity theory. I bet that's going to be confused with chaos theory a lot.

What is complexity theory? Let's let Wired explain:

Not a religiously inspired revision -- intelligent designers need not apply. Nobody suggests that genetic mutation and natural selection aren't responsible for the evolution of birds from reptiles or humans from tree-swingers.

But a growing number of scientists do say that neo-Darwinian evolution doesn't explain certain jumps in biological complexity: from single-celled to multicellular organisms, from single organisms to entire communities.

The jumps -- saltations, in complexity parlance -- appear to be non-linear emergent phenomena, the result of networked interactions that produce self-organization at ever higher levels. From this perspective, Darwinian evolution is a mechanism of a higher universal law, perhaps even a variant on the second law of thermodynamics.

There's something that will strike fear into the hearts of mathematicians and computer scientists everyhwere: non-linear emergent phenomena. Everybody knows that that is the last thing you want to be modelling.

Personal fears aside, this seems like a reasonable extension. It is important to find out how certain networked properties emerge, whether they be multi-cellular organisms, or human societies.

Now, some people, including me before I read this article, might think that classic evolution might be all we need to explain these jumps. Just because we haven't figured it out yet, doesn't necessarily mean that we need a new framework, right?

Maybe, but the second Wired article that caught my eye gives more evidence that our current view of evolution may limit our ability to explain certain phenomena.

When Guy Hoelzer runs computer simulations of organisms living in the modeling equivalent of a featureless plain, he sees them break into different species -- even though there's no reason for natural selection to take place.
That preliminary but tantalizing finding hints at some larger phenomenon driving the mechanisms of neo-Darwinian evolution. Hoelzer thinks the phenomenon is self-organization: combine energy with complex networked interaction and order will emerge.

As with all experiments based on simulation, you have to take this with a grain of salt, but it's certainly more fodder for complexity theory.

One thing I don't quite agree with is that they try to relate it back to the second law of thermodynamics. ("The entropy of an isolated system not in equilibrium will tend to increase over time, approaching a maximum value at equilibrium.") That seems doubtful to me (speciation is entropy?) and also a bit premature. Save the unification with physics for when you've worked out the details of your theory.

Thursday, February 7, 2008

It's Started Already

Remember the "telepathic" DNA story? Well, what'd I tell you:

DNA Found to Have "Impossible" Telepathic Properties

[...]No one knows how individual DNA strands could possibly be communicating in this way, yet somehow they do. The “telepathic” effect is a source of wonder and amazement for scientists.

So we've moved from a "telepathy-like quality", that can be explained, to "impossible telepathic properties". I predict that we'll reach alien influences in two more steps.

Tuesday, February 5, 2008

I love Bad Science

Let me clarify: I hate bad science, but I love Bad Science. Ben Goldacre cuts right through the bull and exposes charlatans, cranks and clueless journalists. In his latest column, he dissects a collection of recent articles that are full of bad science.

I know I’m wrong to care. On the BBC news site “crews were hopeful the 20m cubic litres of water could be held back and not breach the dam wall”. And that’ll be a struggle, since “cubic litres” are a nine-dimensional measuring system, so the hyperdimensional water could breach the dam in almost any one of the five other dimensions you haven’t noticed yet.

Seriously, go read it.

Sunday, January 27, 2008

Interesting Article, Bad Title

LiveScience had an article about a recent discovery in genetics made by some researchers in Maryland. They found out that DNA sequences that are the same are more likely to cluster together than those that are different.

Curiously, DNA with identical sequences of bases were roughly twice as likely to gather together as DNA molecules with different sequences.

[...]

The electrically charged chains of sugars and phosphates of double helixes of DNA cause the molecules to repel each other. However, identical DNA double helixes have matching curves, meaning they repel each other the least, Leikin explained.

It's an interesting discovery, although it remains to be seen how useful it's going to be. But that's not the reason I'm mentioning it here. What made me laugh was the title they gave the article:
DNA Molecules Display Telepathy-like Quality.

How long until the first New Age healer picks up on this and claims he can heal your DNA by telepathy?

Wednesday, January 23, 2008

Woe is PhD

So I'm looking for a PhD place. Ideally, I'd like it to be at my current university, as the other universities in the UK that are good in my field are mostly located in London, and I don't like that city much. Nor can I really afford to live there.

I'd like it to involve Bioinformatics, but not require any wetlab work that I would have to do myself. There should also be scope for applying Machine Learning techniques. There's two areas of research that interest me. One is work in genetics, such as gene regulation modelling, or protein structure and function prediction. The other is to model biological systems at the macro level and predict how changes in the environment influence animal population size or plant growth.

I could also see myself doing a straight Machine Learning project without any Bioinformatics involved, but with slightly less enthusiasm.

I'd like to have a supportive supervisor, who I can talk to before I start my PhD and who will advise me on how to write up my project proposal. He or she doesn't need to be an academic superstar, but a fair number of puplications and at least some amount of recognition in either Bioinformatics or Machine Learning would not go amiss.

I'd like to get the chance to teach during my PhD, either tutorials or even lectures.

A scholarship would be helpful. If I don't get one, my parents could help, but I'd like to be able to support myself for once.

And tomorrow, I will meet with a potential supervisor who might be able to offer the place that has most of these characteristics. (I'm not sure about the teaching yet.) Fingers crossed!