Wednesday, 01 February 2017 13:40

Internship: 6 months at Grenoble INP for "An architecture for managing spatio-temporal Big Data at scale"

Written by 
Title:  An architecture for managing spatio-temporal Big Data at scale (application to IoT)

Specialty: System and Software 
Keywords: Big Data, Storage, Spatial, Temporal, Hadoop, Drill, Index, Query

Prof. Christine Collet, Grenoble INP
Dr. Houssem Chihoub, INPG Entreprise SA

Grenoble Informatics Laboratory (LIG) is one of the largest laboratories in Computer Science in France. It is structured as a Joint Research Center (French Unité Mixte de Recherche - UMR) founded by the following institutions: CNRS, Grenoble INP, Inria, and Grenoble Alps University. The mission within this opportunity will be carried out at the LIG laboratory. It is proposed and funded in the context of the ENEDIS industrial chair of excellence on smart grids in partnership with Grenoble INP. 


Nowadays, data come in huge masses from various sources such as web data, IoT data, scientific data, etc. This data deluge, more commonly known as Big Data, have introduced unprecedented performance and scalability challenges to data management and processing systems. In many cases, data are characterized by two very important dimensions: (geographical) space and time. As a result, many analytics applications and algorithms rely on spatial and/or temporal data queries and computations. Over the years, spatio-temporal data querying have been studied extensively in the literature. However, most of the introduced techniques fail to scale with Big Data volumes, or to integrate efficiently with large-scale data processing infrastructures (such as Hadoop).

The subject of this project is research oriented.  Its goal is to design and build a scalable architecture to store, manage, and process spatio-temporal data. 
It will also contribute to provide a high performance and scalable query engine for spatio-temporal data. The architecture will rely on the Hadoop file system (HDFS) to store data. Additionally, a hybrid solution to combine a scalable and distributed query engine with efficient data indexing and partitioning approach that leverage spatio-temporal data characteristics will be introduced. This latter will aim at providing a fast access to data and fast computing while minimizing data transfer and enhancing data locality. The proposed architecture will be further validated and evaluated with the case of IoT data.  A use case that consists of querying collected and historical sensor data deployed nation-wide on the power grid in France will thereby used for this purpose.

The project will consist in the following steps: 
- Study of the state of the art (bibliographical references, latest technologies): spatio-temporal indexing, Big Data management systems, distributed data querying, etc.. 
- Getting familiar with systems and frameworks: Hadoop, Spark, apache drill, mongoDB, 
- General design specification
- Implementation 
- Experimental validation and evaluation (IoT data)
- Documentation 

Used Technologies 
Hadoop and HDFS, Spark, Apache Drill, MongoDB.
Required Skills
•	Msc, Master, Bac +5 in computer science / engineering or equivalent
Knowledge about  Big Data processing and management / distributed systems: Hadoop, Spark, NoSQL (mongoDB, HBase, Cassandra), distributed file systems, Apache Drill and SQL.
•	Strong programming skills in one or more languages (Java, C, C++, Python, Scala …)
•	Strong English skills are a plus

1500 € gross / month

6 months

Please send CV and cover letter to Christine Collet  (This email address is being protected from spambots. You need JavaScript enabled to view it.
 ) and Houssem Chihoub (This email address is being protected from spambots. You need JavaScript enabled to view it.
Copyright @  2017   DataBase Group for suggestions write to  Webmaster