This lecture introduces theory and techniques to analyze large volumes of data. Big data is usually created by experiments, observations or humans. Besides the sheer volume, data can be characterized by the following characteristics: the velocity it is produced, the variability of its structure, the suboptimal data quality and its inherent value.
We can gain knowledge by analyzing this data using techniques from statistics and machine learning. Global players like Google and Facebook use the introduced techniques for targeted advertising to optimize revenue. However, the techniques are also applicable in the scientific context.
In the exercises, selected open source tools such as Apache Pig, Hive, Spark or Neo4j are utilized to reveal interesting properties of publicly available data sets. The exercises teach the language R and Python and build upon them.
The lecture is a “Wahlpflichtmodul/Vertiefung” in the Master of Computer science; interested students of other degree programs are also welcome – please contact the organizer.
It is expected that attendees have experience in any programming language (e.g., Java). Knowledge about Python, SQL and machine learning is not necessary but helpful.
|Location||DKRZ, room 034|
|Time lecture||Friday 12:15 - 13:45|
|Time exercise||Friday 14:00 - 15:30|
|First meeting||Friday 2017-10-20 12:15|
Note that it is mandatory to subscribe to the mailing list.