Why I’d Let Google Put My Genome in the Cloud

For the past 18 months, Google has been quietly rolling out a cloud computing service for DNA. Google Genomics could one day have millions of genomes on its servers, available to researchers. Are there legitimate privacy concerns here? Definitely, but it’s not Google’s grubby fingers you should worry about.

Genetic databases already exist online, of course, and Google Genomics is only the latest (and most ambitious) iteration. There are genealogy databases for finding your ancestors and long-lost relatives. There are publicly available genetic databases run by national research centres. And there are dozens of datasets shared by research groups on a case-by-case basis with others.

Google Genomics is going around to research centres and universities offering to host their genome sequences for $25/£16 a pop each year. The more genomes it can collect in a central repository, the easier it could be for researchers to share their data. A genome sequence by itself is useless. Without comparing it to others, you don’t know what is a mutation or what is normal. Take two genomes, and you can start having some idea, but you’ll still be swamped by the hundreds of variations. With a database of dozens, hundred, thousands of genomes, you get a much better chance of pinpointing. The bigger your database, the better.

Sequencing genomes is only becoming cheaper and easier, but sharing those many terabytes of data has not. Their size is unwieldy, and different datasets are scattered among different research groups often available on a case-by-base basis. In contrast, Google wants to build one centralized database where a researcher can query millions of genome sequences at once. This is the infrastructure for personalised medicine.