29 Temmuz 2015 Çarşamba

First Bioinformatics Hackathon in Turkey, English Summary 27-29th of July 2015

These 3 days consisted of our 48 hour bioinformatics hackathon, aptly named Biyo Hackathon, in which we were tasked with gradually challenging tasks relating to de novo sequencing in two teams - The Chickpeas and The Beans. The Biyo Hackathon is the first bioinformatics hackathon to take place in Turkey, and as the field of bioinformatics grows we hope it will help usher in more interest in bioinformatics. Our hackathon started at ~9 AM on Monday and ended at ~9 AM on Wednesday. During this time we had little chance for rest, as everyone wanted to beat the other team and so chose to stay awake most of the time. Due to the workload we were tasked with in these 3 days the following notes were taken in a bullet point style to save as much time as possible. The results of our coding can be found on our wiki at

The participants shall be posting their blog updates to voice their opinions on the Biyo Hackathon in the coming days, so keep your eyes peeled. In the meantime lets take a look at what we have done over the course of ~50 hours spanning 3 Gregorian days...

  • brought in the survival supplies from Mr. Ahmet’s car
  • got ready for the coding start with our two teams, the Beans and the Chickpeas
  • first task was to write a code to analyze a given nucleotide/protein sequence
  • s for sequence, p for protein to get seperate results
  • must put all of the tasks into separate functions so we can call any function we desire
  • we the Chickpeas lost due to Mr. Ahmet of the Beans wrecking us with his speed
  • translated the descriptive information from Turkish to English
  • eating break at 14:15
  • sorted for alphabetical sorting, Counter most_common for numerical sorting
  • try to use -1 for all errors
  • losing team makes the sandwiches for lunch
  • moving on to second task with a FASTQ sequence analysis using Phred scoring system
  • troubles with IPython in getting our code to run (such as not registering tabs correctly)
  • speed test between the groups to see who will have the fastest time to analyze a large FASTQ file (AR2_S3_L001_R1_001)
  • second task complete at 22:10, Chickpeas lost again by a 2 second difference in coding result printing speed
  • 22:45 and our pizza is late so it should be with a free delivery
  • 23:10, we cancelled the pizza that never arrived so now we are eating our prepared sandwiches

  • 2:53, woke up after a nap and the code is going along nicely, we only have to get the histograms/boxplots/surprise graphs set up
  • 5:03, third job is halfway through and we are taking a 4 hour break before we tackle the remainder of the job
  • 7:23, people starting to slowly wake up early to get increased coding time
  • 9:25, coding fully engaged and people are trying their hardest to get the code working
  • 11:18, Chickpeas on a fantastic losing streak due to the last minute actions of the savior Mr. Ahmet becoming the MVP for the Beans team
  • 11:57, eating breakfast/lunch is mostly done
  • task 4 consists of writing a big pseudocode for our sequence analysis which consists of assembling Illumina reads
  • short warm up practice with a paper cutout sequence where we tried to match the overlapping bases to obtain the original sequence
  • after warm up we moved on to code the beginning of exercise 4 in IPython, which consisted of many small functions
  • above: struggling with the 4th task
  • we took a break where we waited for our code to be modified by the IPython server
  • our code turned out to be too slow and we spent some time trying to fix the code by replacing re.search, removing some unused elements,  and condensing some functions
  • while we waited for the code to run we watched three TED Talk called What's invisible? More than you think by John Lloyd (https://www.youtube.com/watch?v=8EUy_82IChY), How to start a movement by Derek Sievers (http://www.ted.com/talks/derek_sivers_how_to_start_a_movement?language=en#t-21384), and What if 3D printing was 100x faster? by Joseph DeSimone (http://www.ted.com/talks/joe_desimone_what_if_3d_printing_was_25x_faster?language=en)
  • at around 8 PM we started visualizing our data for the 4th task in Cytoscape, mapping out the relationships of the sequences to each other
  • 21:18, we are trying to deal with a problem in the code where we are trying to replace repeat sequences and print out the box plot of the result
  • around 10:30 PM we finished our dinner and moved on to finish task #4

  • 8:39, the cabin is waking up and getting ready to work for some hours
  • started the day by plotting the joined sequences into a histogram and boxplot
  • copying our joined sequences into blast (http://blast.ncbi.nlm.nih.gov/Blast.cgi) for task 5
  • above: matching one of our joined sequences with a gene
  • code summary by Mr. Ahmet where talked about the functions in the code and how we improved the speed of the code by 50%, talk about some alternatives to verify our code, how de novo assembly code is usually ported over to C to gain a much faster processing time (it took us over 6 hours to fully process our data), took longer than expected to  write the code because we faced many unknown factors such as Phred quality scoring system and how to convert it to code, not much to do when alleles are involved in the code, another issue is the lack of coverage when the data is used for clinical purposes, group writing code is always more difficult than solo coding
  • 11:19, the hackathon is completed and the winning team is... Friendship! (both teams decided to work together to achieve the end goals)

Again for the interested among you, all of the codes we worked on can be found on our wiki address at

Hiç yorum yok:

Yorum Gönder