“I want to be a (better) data scientist, but working alone through online courses doesn’t cut it.”

Data Science Retreat brings together top data scientists and participants seeking to grow an exceptional amount quickly.

  • Full-time program for 3 months in Berlin, Germany
  • Next batch starts April 18 2016
  • LATEST: 83% of Dec15 graduates secured job offer within 2 months
  • 50% have multiple job offers
Want to be notified about exciting developments in DSR?  
Click here.
You are interested in:

Thank you! Check your INBOX for a confirmation email.

Oops! Something went wrong while submitting the form

DSR for data scientists

The 3-month Retreat

Our approach to teaching is highly opinionated, and is based on our extensive experience in state-of-the-art machine learning, data science, and big data engineering. We know what works, and what doesn’t (we could write a book on the latter!) for face-to-face instruction. If you want to learn this stuff as fast as humanly possible, this is your best bet: small class sizes, expert mentors who can teach, and battle-tested material. Our classes are very much hands-on, and are taught by senior data scientists and senior data engineers with many years of practical experience.


Tuition: 10,000 € to be paid within 2 weeks of acceptance
Where: Full-time for 3 months in Berlin, Germany
Class size: 5-10 students

Is this for me?

Data Science Retreat is for people with previous work experience and at least some basic knowledge of machine learning. You need to have spent at least 1000 hours programming, even in a language that is not ‘data science friendly’. You will spend most of your time here programming, so you must be confident in your skill. You do not need a deep mathematical background, although some techniques do rely on knowing some linear algebra and probability theory.

Much of the material in this track is similar to what we teach in our retreat for data engineers, but the emphasis is on finding questions you can answer with modern techniques, and on producing the best performing model humanly possible. You will prepare communications to different audiences (we train you, but you may want to feel confident speaking in public before applying). 

Portfolio project

The most important part of your retreat experience will be your portfolio project. We’ll be around to give advice when you get stuck, but you’re going to build something amazing, on your own. Think of this as the demo that gets you a new job.

Master classes

Good engineering practices in python, creating and consuming APIs

Amelie Anglade

A crash course in Python, including the language and its ecosystem. We also cover the basics of working with git, writing tests, building packages, and how to create and consume APIs.

Machine learning overview: proficiency with core methods

Jose Quesada

This two-day workshop is designed to teach developers who have taken online courses on machine learning how to implement models that perform well. It is not an advanced course; it’s designed to kill many misconceptions that people have about core machine learning models.

Advanced machine learning: model pipelines

Pawel Jankiewicz

Once your company starts fitting models, methodology matters. It is easy to simply pile up complexity without managing it. Fortunately, we now have best practices (and libraries) that make it easy to iterate over preprocessing, model families, and parameters.

Numpy, Scipy, Pandas, Scikit-learn

Amelie Anglade

Numpy and scipy took Python from a general programming language to a very powerful, matrix-oriented one. Pandas brought data.frames to Python (data.frames is one of the core concepts in modern data analysis). Building on these data structures, scikit-learn brought killer implementations of best-of-breed algorithms, all under a standarized library. Together, these packages have made Python the programming language of choice for many data scientists.

Big data processing with Spark

David Anderson

Originally a research project at UC Berkeley, Spark is now a top-level Apache project and the fastest-growing open source project in history. In this master class, developers will use hands-on exercises to learn how to work with the relevant parts of the Hadoop ecosystem, and the principles of Spark programming.

Real-time stream processing with Spark, Kafka, and Elasticsearch

David Anderson

In this hands-on workshop you will build a real-time data pipeline that receives data from Twitter, stores it into Kafka, processes the stream using Spark, and stores the processed stream into Elasticsearch.

Real-world recommender systems

David Anderson

Recommendations are widely used in many industries, such as e-commerce, jobs, music, and social media. This course goes beyond the basics and emphasizes solutions to problems you will face when your business deploys a recommender system.

Deep dive into R

Marek Gagolewski

So many people think of R as a very messy tool for doing data science. Sorry, but they're wrong: R is a general purpose functional programming language, which is governed by a few clear rules. This course will expose you to the depth and breadth of R so that you will have a deep understanding of how everything works, and why. Even if you have no particular experience in computer programming you will notice how easy it is to write powerful R scripts.

Speeding up your R & python models: Rcpp and Cython

Marek Gagolewski

When writing high-quality data analysis software in R or Python that will be used by other people, you should use a compiled language if you aim to deliver the best possible performance. The aim of this course is to give you a working introduction to best practices of C++ programming, data structures, and algorithms so that you can achieve these goals.

Optimizing data structures and memory usage: advanced data.table

Arunkumar Srinivasan

After participating in this masterclass you should: Understand the data.table query language deeply and completely Be able to do operations in memory that were previously unthinkable Know how to work with time slices of events, and do rolling joins. Understand moving windows of time slices

Deep learning for image classification

Ludwig Schmidt-Hackenberg

The aim of this two day hands-on masterclass is to introduce Deep Learning and give insight into the hype. In the tutorial Ludwig will give an overview of the existing techniques and applications, show the differences to traditional approaches, and discuss the limitations of deep learning. As part of the tutorial we will build a deep learning system from the ground up, and train it.

Our Mentors

DSR is the only program worldwide whose mentors are at the Chief Data Scientist and CTO level. They are invested in your progress, and will train you to have the right mindset, to solve business questions with technology, and how to advise leadership. Some mentors teach, others only provide advice during portfolio project time.

Pere Ferrera

Pere is co-founder and CTO of Datasalt. He’s a core committer in two Hadoop-based open-source projects, Splout SQL and Pangool. Splout provides a SQL view over Hadoop's Big Data with sub-second latencies and high throughput. Pangool is an improved low-level Java API for Hadoop based on the Tuple MapReduce paradigm (ICDM 2012). Pere is an early adopter of Hadoop, working in Big Data projects since 2008. He’s also the organizer of Big Data Beers Berlin.

Adam Drake

Adam Drake is Chief Data Officer at one of the world's most successful online travel companies. He has been in technology roles for over 15 years in a variety of industries, including online marketing, financial services, healthcare, and oil and gas. His background is in Applied Mathematics, and his interests include online learning systems, high-frequency/low-latency data processing systems, recommender systems, distributed systems, and functional programming (especially in Haskell).

Mikio Braun, PhD

Mikio is a data science researcher and blogger. He previously was co-founder of streamdrill, a company focussing on real-time data analysis. He is part of the Berlin Big Data Competence Center, which aims to bring together machine learning and scalable technologies to create the next generation of Big Data infrastructure. He is also the author of jblas, a fast matrix library for Java which is used by PayPal, and Breeze, and Apache Spark.

Jose Quesada, PhD

Jose Quesada is the founder and director of DSR. Jose helps others to decide better, do better, or be better through data. Like everyone else, he doesn’t know what data science really is, but suspects it has to do with predicting the future before it catches you empty-handed. He has a PhD in Machine learning and worked at top research labs (U. of Colorado, Boulder, Carnegie Mellon, Max Planck Institute). Previously he was a data scientist consultant, specializing in customer lifetime value, and as the head data scientist for GetYourGuide.

Trent McConaghy, PhD

Trent is co-founder and CTO of ascribe, which uses modern crypto, ML, and big data to tackle challenges in digital property ownership. His two previous startups applied ML in the enterprise semi-conductor space: ADA was acquired in 2004 and Solido is going strong. He has an engineering PhD in applied ML from KU Leuven, Belgium. His interests include large scale regression, automating creativity, anything labeled "impossible", and thousand-fold improvements. He was raised on a pig farm in Canada.

Arunkumar Srinivasan

Arunkumar Srinivasan is finishing a PhD in Bioinformatics from the Max Planck Institute. He started using R in late 2011 and is coauthor of the data.table R package, which offers fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread). Arun has a passion for developing tools and algorithms facilitating big-data analyses.

Marek Gagolewski, PhD

Marek is a true R hacker and enthusiast since the Paleozoic era of R_1.4.0. Author of a best-selling Polish book on R programming and many R packages, including the famous stringi packages. Computer programmer since the age of 6 (C64 basic, C/C++, assembler, PHP, Java, VHDL, bash, Julia, Maxima, Lisp, Fortran and many others). Marek has a PhD in computer science and specializes in data aggregation, fusion and mining, computational statistics, and uncertainty modeling. Currently an assistant professor and a tutor and mentor at the Warsaw University of Technology, Poland.

Daniel Nouri

Daniel is an expert software engineer, Python programmer, and machine learning specialist. When he's not developing high-performing, end-to-end pattern recognition and predictive analytics systems for his clients, Daniel's learning new tricks to train deep neural networks more efficiently. Through his company Natural Vision, he's been successfully applying deep learning to problems in bioacoustics, computer vision, and text mining.

David Anderson

David is the Head of Big Data Engineering at DSR. He began his career as a senior research scientist at Carnegie Mellon University, Mitsubishi Electric Research Labs, and Sun Labs. His research career focused on tangible user-interfaces and real-world applications of machine learning. Since 2005, David has been leading the development of data intensive applications for companies across europe — most recently as CTO at RetentionGrid. 

100% of
participants
got multiple interviews out
of DSR.

60% of
participants
had to choose from multiple job offers.

86% of
participants
got the job they wanted out of DSR.

Why you should do Data Science Retreat

Go faster

There’s plenty of good material online to learn machine learning and data science on your own. We now live in an autodidact’s paradise. The question is, how can you get there faster than everyone else?

Break the “barrier of excellence” that one reaches when learning alone

No matter how many MOOCs you do, there’s a barrier that very few people ever get past. Jump over it.

Build a serious data product

Products are the new CVs. What interviewers really want to see is “What have you done when nobody told you what to do?”

Be surrounded by other people seeking excellence

We accept about 10 people out of the 200 who apply for each batch. They are extremely motivated and have skillsets complementary to yours. Do you want to spend time in the same room with them?

Deep learning at DSR batch 4

Things you didn’t know about Berlin

Inexpensive accommodation

City
Number of rooms
Average cost

San Francisco (Zipfian Academy)

1 bedroom

$3,213

New York City (Metis; NYC Data Science Academy)

1 bedroom

$3,039

Berlin (Data Science Retreat)

1 bedroom + utilities
1 bedroom (in a shared flat)

$726 (658 EUR)
$497 (450 EUR)

Affordable high-quality-of-life

  • Unlike many US cities, you don’t need a car to get around in Berlin. The local public transportation system is inexpensive and includes buses, trams, subways, and commuter rail. Plus there are bike paths on most major streets. Bicycles are first-class vehicles.
  • Berlin is surrounded by large forests, and is only 2 hours from the Baltic Sea. Eating out is very affordable (most restaurants have a daily deal at around 5 EUR). The health system is efficient and cost-effective. The top two gym chains each cost less than 20 EUR per month.
  • Nightlife is spectacular: A thriving electronic music scene, parties that begin Friday night and end Monday morning, taking place in green parks, man-made river-beaches, and abandoned basements and buildings.

Startup capital

  • The city has its own street dedicated to and known as the home of startups called “Silicon Allee”, which is the long city street, Schoenhauser Allee.
  • In 2014, Berlin startups raised $1.8 billion and the city is predicted to raise even more in 2015. 
  • With a strong tech-culture here, you can literally go to a tech meetup every day, with world-class speakers.
  • Pretty much everything can be handled in English.
  • The big data and machine learning communities are extremely active.