Friday, December 21, 2007

Google's Newest Invasion

Having cornered the search market, taken over Youtube and struck fear into the hearts of publishers, Google is licking its lips and looking for new targets. Next stop, Wikipedia.

The potential Wikipedia killer app that Google is developing is called knol, and it has of course sparked much discussion in the blogosphere, with bloggers outbidding each other in who can come up with the wittiest pun. (My favourite: Google Sets its Guns on the Grassy Knol). But puns aside, should Wikipedia be worried? Should we?

Yes and no. You have to remember that Google does not always suceed. Remember Google Video? That didn't take off. Google News? I don't know any News junky who uses that. And I don't see people rushing to pick up Google Talk.

Knol operates on a completely different model from Wikipedia. Instead of having a page for a topic that everybody can edit if they think they know better than the original author, in Knol, the author controls the content, and other users can only suggest changes. Also, you could have more than one "knol" on each topic. Google assures us that more popular (and hence, presumably, more accurate) knols will float to the top, but will they really?

Of course, you have a similar problem in Wikipedia, but it's mitigated because you can correct information: In knols, it seems you can only decide if you like the information or not.

Then there is the problem of orphaned topics: What happens if a "good" knol is abandoned by the author? Can we reuse it to start a new knol? Or is that information frozen in time forever, neither to be reused nor updated?

There are tons of problems that could bring knols to its knees. Of course, there are also reasons why it could succeed: There's less risk of vandalism like Wikipedia has seen, and there are more incentives for people to contribute (Google has agreed to share ad revenue if knol owners let them place ads on their pages).

Whether or not knol succeeds, I believe that the two models are different enough that they could even exist side-by-side. After all Encarta and the Encyclopedia Britannica both sold copies, didn't they?

Thursday, December 13, 2007

Pay Per Use Bioinformatics Software

Equinox, a company based in London and started by the Imperial College, has started offering access to some of its bioinformatics tools on a pay per use basis. Basically, they host the software on their servers, and you pay them each time you want to run a query. Their flagship product has to do with protein structure prediction:
The first product available will be Equinox's leading Phyre(TM) homology modelling and fold-recognition software. User research has shown that proteomics is an ideal target market with positive feedback from research, biotech and pharma audiences.
Yadda, yadda, the press release goes on to state how great this all is. Of course my knee-jerk reaction was: "Pay? Nevah!" But then I realised that you'd have to pay anyway. Previously, this product had only been available via a software licence. Now, that's fine for big companies and universities, who use it frequently. I suspect most of them will stick with the licence as well.

But for smaller companies, or an isolated researcher who may only need to use the software once in his career, the Pay Per Use model may actually have an advantage. Of course, it would be preferable if they'd offer the tool for free, but what kind of business model is that?

Unfortunately, I couldn't find out what the actual price per use was, or how it compared to the cost of a licence. How many uses before a licence would be cheaper? They need to get the balance right, otherwise people will opt for licences every time, and the PPU idea might well die a premature death, as far as bioinformatics is concerned.

Meanwhile, if you want to play around with a bare-bones version of Phyre, you can still go here. Just put on your academic hat first.

Thursday, December 6, 2007

Open Source Genetics?

Here's an interesting premise: Wired Blog has an article on "The Open Organism: Genetic Engineering in the Open Source Era". What would happen if you applied the principles of Open Source Software to genetic engineering?
Modularity in computer science has helped unleash crazy amounts of creativity, and new business models derived from user-generated content. Take Google Maps open-API. Or even HTML itself, which allowed users to create graphically sophisticated pages with no real programming knowledge. By putting the hard stuff into a black box and just letting you access what you need to know, user/producers have been able to focus on creating interesting content quickly and easily. What if, in the next decade, the same group of elite users/coders could do the same thing with corn?
They might be a little too optimistic, in my opinion. The allure of Open Source (or indeed any kind of hacking) is that anyone can do it. You don't need much initial investment, beyond the computer which you likely already have. Install Linux, get a GNU compiler of your choice, fire up the text editor and you're in business.

Genetic engineering is not like that. Or maybe it's exactly like that, but at a far grander scale. Instead of a computer*, you need a lab: You need pipettes, petri dishes, microscopes, solutions, PCR machines, microarrays, maybe even a gene sequencer. These things don't come cheap.

But let's assume for a moment that you have all of that already. Then you'll still need the things every programmer takes for granted, the libraries or APIs containing shortcuts to all the common tasks that you don't want to design from the ground up. As a genetic engineer, you'll need promoters, restriction enzymes and specialised vectors, each different depending on what you started with.

It is always possible that in the future, genetics labs and components will become as ubiquitous as computers and code libraries. I'm sure that when we were putting punch-cards into basement-sized supercomputers, open source software development seemed as far away as open source genetic engineering seems today. But the transition still took thirty years. I don't think we have to worry about it just yet.

*Or rather, in addition to.

Thursday, November 15, 2007

Reviewing Fun

One of the required courses for my Masters is a literature review on a topic which we can choose ourselves. So I've been reading lots and lots of papers (on Bayesian networks for modelling gene regulation, in case you want to know), and the more I read, the more I can see certain common themes emerge. Not common themes about the topic, mind you, but just about papers in general.

First of all, most papers can be summarised pretty easily. However, the summary I would come up with almost never matches with the abstract that the authors wrote. I realise that this is a function of their desire to show every aspect of the paper in their abstract, while I would summarise the most important ones (which might be subjective), but I'm still left with the feeling that most abstracts are not reflective of the gyst of the paper.

Secondly, too many papers overuse references. I've read papers where there's two pages of text and three pages of references. What especially ticks me off is when the mentions a topic and then gives five references for it. We don't need five references, we need one good reference. Maybe two if there are two particularly good papers and you can't decide. Five is just overkill.

Thirdly, and finally, I've noticed a distinct lack of detail in some explanations. Now this is something I can understand if you're trying to boil down a paper to two or three pages for publication. But if you're going to gloss over something, at least say that you're doing so. Also, since this is the 21st century, how about providing a link to your webpage where more detailed information can be found?

Monday, November 5, 2007

Undergrad Proves Important Theorem

I read on the Wired blog that a 20-year old engineering student from the University of Birmingham has formally proved that a certain Turing machine model invented by Stephen Wolfram* is the simplest possible model.

Turing machines, for those not up on their theoretical computer science, are simple computing machines that Alan Turing conjectured were capable of calculating any computable function (he didn't say anything about efficency).

The usual model is that of a machine with a ticker tape with a sequence of symbols and a number of states. The machine looks at the tape one symbol at a time, and, depending on the symbol and its current state, decides what to do: Go forward one character, go back one character, overwrite the current character, or change state.

What makes this minimal Turing machine special is that it only has two states and three different symbols (sometimes called colours). The student proved that if you take away a colour or a state, it wouldn't be a Turing machine anymore.

Meanwhile, what important result did I discover during my time as an undergraduate? Oh, that's right, I discovered that you can live on nothing but pizza for a week...

*Yes, the same one who created Mathematica.

Wednesday, October 24, 2007

Sandra Porter over at Discovering Biology in a Digital World has some computer woes from her Bioinformatics class to relate. All I can say is, I'm glad our Bioinformatics course is run by the Informatics department. Although so far, computer use for that course has been minimal.

Tuesday, October 16, 2007

No, I'm not Dead

Just busy. It appear that while I was trying to catch up to uni work (imagine a hamster in a wheel, running as fast as he can, always convinced that he will get there eventually), Al Gore won the Nobel Peace Prize.

Now, some people are wondering why he won. There's two responses for that.

First, there's nothing more important to peace than stopping global warming. Pandagon has a more elaborate explanation of this point than I'm willing to give.

Second, get some perspective. The Nobel Peace Prize has gone to Yasser Arafat before, and you're complaining about Al Gore getting it? Now, if they were giving it to George Bush, then you'd have something to complain about...

Monday, October 8, 2007

Quantum Confusion

For every computer scientist waiting for the first working quantum computer, there are two that are hoping that that day will never come, because it means that they will have to start developing quantum algorithms. Why is that a problem? Well, have a good look at Shor's quantum algorithm for integer factorisation. Yep, it's not pretty, in fact, I'm not sure I could find out how it works from that article.

Fortunately for me, and other lazy computer scientists, Scott Aaron has provided an easy-to-understand guide to Shor's algorithm. In the process of explaining it, he also debunks some of the most common myths about quantum computing:

Look: if you think about quantum computing in terms of “parallel universes” (and whether you do or don’t is up to you), there’s no feasible way to detect a single universe that’s different from all the rest. Such a lone voice in the wilderness would be drowned out by the vast number of suburb-dwelling, Dockers-wearing conformist universes. What one can hope to detect, however, is a joint property of all the parallel universes together — a property that can only be revealed by a computation to which all the universes contribute.

(Note: For safety reasons, please don’t explain the above to popular writers of the “quantum computing = exponential parallelism” school. They might shrivel up like vampires exposed to sunlight.)

Thursday, October 4, 2007

On Participation During Lectures

I don't speak up much in lectures. That's not because I'm shy (well, not mainly). Nor is it because I have nothing to say. No, I'd say that the main reason is lack of opportunity.

I should start off by explaining that the British lecture system does not encourage participation in lectures. Unlike the American system (I'm told), you're being lectured at, and 90% of the lecturers expect nothing more than polite attention from you. Very rarely are you actively being asked to participate.

That leaves only two opportunities for speaking up: Answering questions that the lecturer asks and asking your own. And to be fair, a lot of people do answer and ask questions. But I have a problem with both.

When it comes to answering questions, the problem is the sort of questions that get asked. Some of them are too hard: I've never been very good at thinking on my feet, so I seldom work out the answer to a hard question in a lecture. Most of the others are too easy: If the answer can be found one line further down on the slide, then I'm not going to bother to tell it to him.

Asking questions presents a different problem. I've found out over the years that I can find out by myself most of the questions that I would ask in the lecture if I look over the lecture notes later on. That knowledge, and a bit of pride, prevent me from asking them during the lecture.

I realise that all of these problems could be solved by changing my attitude, but it doesn't seem worth the effort. I don't get the feeling that more participation will give me a deeper understanding of the material. It just seems like so much wasted effort. And maybe that's one of the flaws of the British system.

Wednesday, October 3, 2007

Do WHAT for the Forest?

This doesn't have anything to do with machine learning, but I just couldn't pass it up. Fuck for Forest* is a porn site for and by environmentalists. Yep, there really is porn for every niche on the internet. Here's how they describe themselves:
FFF are concerned humans, exploring the power of sexuality, to save nature and liberate life. We know a lot of people are interested in sexuality, including us. We want to have fun with sex, show natural people and collect money for saving nature! We think it is time to pay respect to nature, and give back with love!
I guess the end justifies the, er, porn?

* If you believe that this is safe for work, you probably don't deserve your job anyway.

Tuesday, October 2, 2007

Skynet can't be far off...

From the scary side of AI comes this report from Wired: The US government is sponsoring research into tracking terrorists on the Web. Now that might sound innocuous, but according to the article:

The University of Arizona's ultra-ambitious "Dark Web" project "aims to systematically collect and analyze all terrorist-generated content on the Web," the National Science Foundation notes. And that analysis, according to the Arizona Star, includes a program which "identif[ies] and track[s] individual authors by their writing styles."

That component, called Writeprint, helps combat the Web's anonymity by studying thousands of lingual, structural and semantic features in online postings. With 95 percent certainty, it can attribute multiple postings to a single author.

From there, Dark Web has the ability to track a single person over time as his views become radicalized.

The project analyzes which types of individuals might be more susceptible to recruitment by extremist groups, and which messages or rhetoric are more effective in radicalizing people.

You can probably imagine what would happen if Writeprint were used to track down real terrorists and made a mistake. Better not type to heatedly in those flame wars. And stay away from "they set us up the bomb" jokes!

I'm not condemning the research as such: The idea of being able to tell who a certain text was written by is fascinating. But the application is worrisome. If this ever becomes a viable tool for counter-terrorism, it should be very strictly controlled.

The Wired article focuses on Writeprint, but a quick look at the website for the Darkweb project shows some more interesting projects:
The Terrorism Knowledge Portal is a search engine created specifically for the domain of terrorism research. [..] It aims to explore governmental, social, technical, and educational issues relevant to supporting intelligent Web searching in terrorism-related research. The portal supports searching of a customized terrorism research database with over 360,000 quality pages. In addition, it provides access to terrorism research institutes, government Web sites, news and presses, and a collection of useful Web resources for researchers.
A terrorism search engine? When is Google getting in on this?
A computer-driven natural language chatterbot that will respond to queries about the terrorism domain and provide real-time data on terrorism activities, prevention, and preparation strategies.
Real-time data on terrorism activities? Good luck with that.

Finally, to close on a lighter note, the Wired article quotes the National Science Foundations on some of the risks of the project:

"They [terrorists] can put booby-traps in their Web forums," Chen explains, "and the spider can bring back viruses to our machines." This online cat-and-mouse game means Dark Web must be constantly vigilant against these and other counter-measures deployed by the terrorists.

Right. Because obviously you instruct your spider to download any file it finds to your servers, execute it and maybe display a skull logo on your monitor as well while the virus deletes your files. I understand your desire to make your work sound glamorous by phrasing it in terms of a battle, but please don't pretend you don't know about basic security procedures.

Monday, October 1, 2007

Three-Toed Sloth has a thoughtful examination of heritability in IQ, complete with disclaimer:
Attention conservation notice: It's long, and it's about something which makes eyes glaze over even as tempers flare up, and it's not funny at all. Worse yet, more is on the way. You could always read it later, but time spent now is gone forever.
The conclusion? IQ may depend much less on your genetic heritage than previously assumed. I find some of his arguments quite convincing.

Sunday, September 30, 2007

Injustice in Open Source

Rob Knop of Galactic Interactions has an article about misogyny in open source development.
Whereas the number of women in biology and chemistry has improved a lot in recent decades, the number of women in Physics creeps up much more slowly. Meanwhile, in computer science, the number of women has actually be declining. As for the absolute values of those numbers, one need only look at a picture of a Linux Kernel Developers' Summit to realize that within statistical uncertainty, the number of Y chromosomes is the same as the number of people in the picture.
Knop attributes this to the misguided belief of men in these fields that they are somehow smarter than the rest of the populace, and hence also smarter than women. It must be that way, because you don't see many women computer scientists, do you? It never occurs so them that their attitude might be the reason that many women don't try for a career in computer science.

While I've never worked in open source, I tend to agree with Knop's assessment. Now here's the million-dollar (or million-women working in computer science) question: How change this state of affairs?

Sure, it would be nice if open source developers would be more welcoming to women, but let's just assume that they're not that cooperative for a moment. There's certainly no way to force them, after all, these projects are their own to do with as they please. So how can women fight back?

What I would like to see is more open source projects started by women, with a quota of 50% or more women developers. Maybe a little community could spring up around it. Think of it as Sourceforge with more X chromosomes. There could be discussion forums for female software developers, and maybe a blog collecting instances of misogyny and of female successes. All it takes is a few women developers getting together, buying a couple of servers, coming up with a light-weight content management system, and you're off. No males required.

Friday, September 28, 2007

What's So Special about the Humanities?

The school finally showed some mercy and moved us to a new lecture theatre for the Probabilistic Modelling class. Not only does this mean that we no longer have to suffocate in a small room, but the new venue is also in one of the old buildings of the University.

I've very seldom been in the old buildings, with the exception of special occasions like exams and graduations. Mostly, these buildings only house the Humanities as well as the School of Law and the School of Medicine.

It's a complete change of scenery. Where we get ugly buildings from the seventies and eighties, with sparsely furnished, functional lecture theatres, the other schools are housed in huge stone buildings with marble arches, balconies and skylights. I mean, I get it, they have been around longer than Science and Engineering, but seriously, would it kill the University to at least give us some lectures in nice surroundings?

Thursday, September 27, 2007

Newsflash: Earth Simulator Exists, Written in Fortran

This is hilarious: A professor from the University of Texas claims to have written a complete simulation of the planet Earth. By himself. In Fortran. Oh, and it runs in a matter of minutes. There are so many unbelievable, nay, impossible claims in his paper that it's hard to decide where to start in dissecting it. Fortunately, Marc Chu-Carroll over at Good Math, Bad Math has already done it better than I could.

I think that accomplishing that [incorporating quantum mechanics] would easily win him another Nobel prize, in addition to the Nobel for the non-quantum simulator. All he'd need to do is publish the data. Wouldn't that be a coup? A creationist professor from a diddly little school in Texas showing up all of the best and brightest physicists in the world, with something he did on a lark with one of his friends? Gosh, why do you suppose that he hasn't published this? Hasn't shown anyone the simulation? You think that maybe, just maybe, there's a reason for that?

I suppose that Granville, modest gentleman that he is, might not like the spotlight that these awards would generate. That must be why he only mentions this astonishing feat of brilliance in a piece of sloppy apologetics.

Assignment Time

I just heard that the first assignment for this year will be about spam detection. Ironically, spam detection is exactly the topic that I was looking at late last year when choosing a project for the Google Summer of Code. Now if I had actually got off my behind, put together a proposal and done that project, this first assignment might be a breeze. Oh well.

Tuesday, September 25, 2007

Cooking for Engineers

Ever been frustrated with imprecise recipes? A pinch of salt. What exactly is a pinch of salt? And does "whisk" mean whisking until solid, or until combined? Well, Cooking for Engineers promises to change all that with recipes made for engineers, by engineers.

Of course, we'd better hope they're not software engineers, otherwise we'd have to start cooking before we get the recipe, change ingredients after putting them in, and finish up by testing the meal on the dog.

Monday, September 24, 2007

Predicting Antibodies

Here's some interesting news from ScienceDaily: Researchers at MIT have developed a computational model that can predict how given changes to an antibody can influence its effects.

Traditionally, researchers have developed antibody-based drugs using an evolutionary approach. They remove antibodies from mice and further evolve them in the laboratory, screening for improved efficacy. This can lead to improved binding affinities but the process is time-consuming, and it restricts the control that researchers have over the design of antibodies.

In contrast, the MIT computational approach can quickly calculate a huge number of possible antibody variants and conformations, and predict the molecules' binding affinity for their targets based on the interactions that occur between atoms.

The interesting bit is the prediction. It's presumably easy enough to model changes an antibody (essentially a protein), but predicting what the new version of the antibody will do is much harder. I can't wait to read the paper.

Sunday, September 23, 2007

Breasts on Facebook?

Brace yourselves. What I say next may shock you. Ready? Okay:

There were breasts on Facebook.

Don't panic! Breathe. Stay with me. The good news is, Facebook acted decisively and removed ALL of the...

Wait, hang on.

Okay, I'm being told they didn't remove all of the pictures of breasts. Apparently, they removed pictures of a woman breast-feeding and banned the woman in question from Facebook.

Now, even most prudes would find it hard to condemn breast-feeding as obscene. See, contrary to popular (read, male) opinion, breasts actually serve another purpose besides being sexually arousing. In deleting the pictures, Facebook was focusing on the nudity of said breasts, and many commentators are making the same mistake. Instead of saying "Facebook deletes pictures of woman's breasts", how about "Facebook deletes pictures of woman feeding her baby". That's all that that picture was about, a baby being fed. How much more innocuous can you get?

Saturday, September 22, 2007

How Quaint

I don't know about you, but I barely remember web searching without Google. So it's kind of quaint to see the humble beginnings of Google in this paper from around 1997, The Anatomy of a Large-Scale Hypertextual Web Search Engine:
Search engine technology has had to scale dramatically to keep up with the growth of the web. In 1994, one of the first web search engines, the World Wide Web Worm (WWWW) [McBryan 94] had an index of 110,000 web pages and web accessible documents. As of November, 1997, the top search engines claim to index from 2 million (WebCrawler) to 100 million web documents (from Search Engine Watch). It is foreseeable that by the year 2000, a comprehensive index of the Web will contain over a billion documents. At the same time, the number of queries search engines handle has grown incredibly too. In March and April 1994, the World Wide Web Worm received an average of about 1500 queries per day. In November 1997, Altavista claimed it handled roughly 20 million queries per day. With the increasing number of users on the web, and automated systems which query search engines, it is likely that top search engines will handle hundreds of millions of queries per day by the year 2000. The goal of our system is to address many of the problems, both in quality and scalability, introduced by scaling search engine technology to such extraordinary numbers.
What is this "Altavista" thing they keep talking about?

Friday, September 21, 2007

Some Thoughts About Recaps

So today's course on Probabilistic Modelling started, not surprisingly, with a recap of basic probability theory. I'm not objecting to that, and there were clearly people in the class who have not done probability before. However, I had the same recap last year for a Modelling and Simulation course, and the year before that for an Artificial Intelligence course.

The repetitiveness of it got me thinking: Why waste time on all these separate recaps? Wouldn't a much more elegant solution be to organise one class of one or two hours a year which recaps probability theory, and everybody who needs it could go there?

Sure, there might be scheduling problems, but it still seems better than to subject everyone to the same recap, and force three different lecturers to teach the same material at roughly the same time. And I'm sure probability theory is not the only subject that recurs frequently in recaps.

Thursday, September 20, 2007

First Impressions

The view of the city is lovely from the 7th floor of that skyscraper where our first lecture was housed. On a clear day like today, you can see all the way to the hills in the distance. I know this because I've had lectures there during my undergraduate career. I certainly didn't notice any of this today, because I was crammed into a corner of the room, trying desperately not to suffocate as more and more people piled into the room.

There are three kinds of lectures in Informatics; those that are contain so much Maths that only joint CS/Maths students take them, those that focus on cramming as much knowledge as possible into our heads, and those that don't require much Maths or knowledge, but teach us how to actually solve problems. You can probably guess which is the most popular kind.

This lecture was one of those, and despite having a certain fondness for the mathematical courses, I'm not complaining. I merely wish that, at the next lecture, there will be fewer people and more space in the room for other things. Like, for a start, oxygen.

Wednesday, September 19, 2007

Obligatory Introduction

If you're reading this, I can only assume that you took a wrong turn somewhere in the network of internet tubes. You misclicked, your mouse shattered on the ground, and you've ended up on my blog. Now you're staring at your screen in horror, desperately trying to remember the keyboard shortcut for the Back button.

Fear not, gentle reader. You're quite safe with me. I do not use this blog to present my Hentai collection to the world or introduce you to the latest in body modification (don't do a Google Image search!). I won't even post images of my cat, mainly because I don't have a cat. I promise that nothing on these pages will make you want to scoop your eyes out with a spork. Unless I link to Warren Ellis.

So what can you expect from this blog? Well, I am currently starting an MSc. programme with its main focus on Machine Learning at a wonderful British university, and I expect that will colour the Gaussian Noises that this blog emits. Look forward to reading about my take on current research, old ideas, and any weirdness that I might come across. Of course, I reserve the right to post off-topic once in a while. But I promise: No cats!