Belgium
4 days ago
Bell Labs Internship on a geo-Distributed Streaming System using Trusted Execution Environments (PhD)

Your Role 

You are taking part to a project, which is at the intersection of research in confidential computing (using TEEs) and distributed computing (using stream processing frameworks). We welcome applicants with experience in either. During this internship, you will work closely with us and become familiarized with industrial applications of the research work. We attempt to find a fit such that this internship can become an integral part of your PhD (e.g., as an application case) and lead to a paper written in collaboration between you and us.

Framework explanation

With the rise of the Internet of Things (IoT), many sensors produce vast amounts of data. To reduce latency and bandwidth requirements, systems that process this data are typically geo-distributed, with processing occurring on the device, in a nearby edge device, in a regional edge cloud, and in a central cloud. This has led to the emergence of data processing platforms where multiple stakeholders contribute data or provide services that use the data. 

This involves collaboration across many parties: an infrastructure provider hosts the platform, different data owners contribute their data, and service providers develop and deploy services. For example, in an intelligent transport system, a road operator is an infrastructure provider that processes data from road users (data owners), used by service providers such as emergency services, mapping applications, or road maintenance. Similarly, an electricity grid operator (infrastructure provider) collects data from electricity meters installed in households and companies (data owners), which is processed by service providers such as governments, electricity producers, and electricity consumers.

In these scenarios, while the different parties benefit from the collaboration, in practice, they are often hindered by concerns about trust. The data owners want to keep control of where and by whom their data is used, requiring data confidentiality. The service providers want to ensure their computations run as specified, hence requiring code integrity, and may require their proprietary code (e.g., ML models) to be kept confidential, thus requiring code confidentiality. The infrastructure provider must ensure these guarantees to both parties.

A Trusted Execution Environment (TEE), such as Intel SGX, is a secure area of a processor in which code and data can be loaded. It guarantees confidentiality, i.e., that the code and data cannot be accessed from outside the TEE, and integrity, i.e., that the code is executed as specified. This is implemented in hardware and guaranteed by the hardware vendor. In this project, we will leverage TEEs to provide the required trust guarantees for a distributed data processing platform. Such an application would consist of TEEs running in various locations – edge, edge cloud, cloud – that each contain a part of a distributed stream processing platform (e.g., Apache Flink, Apache Spark).

Your project

In this project, you will explore and solve the challenges related to building such a system. We plan to adapt an existing distributed stream processing framework to run within TEEs across distributed nodes, including secure communication. We envision you will need to design a protocol that can convince a data owner that only pre-approved services can access their data. Moreover, it should remain possible to exploit typical features of stream processing platforms such as scalability (adding/removing nodes), failure recovery, and dynamic migration of workloads while maintaining the trust guarantees. This will lead to implementing a TEE-based “trusted orchestrator” such that (1) service providers can deploy and upgrade services, (2) the infrastructure provider can scale and adapt their nodes dynamically, and meanwhile (3) data owners can be certain their data remains confidential.

Additional important information

The duration is flexible and to be agreed (typically 3-4 months). 
The starting time is flexible.
You must relocation to Belgium for the duration of the internship. Note that hybrid work is possible.
This is a paid internship.

You are a student enrolled in, or with a PhD. in Computer Science or Engineering. We are open to starting PhDs, students that are more advanced in their PhD, or post-docs.You have experience related to either stream processing platforms (Apache Flink, Spark, Beam) or trustworthy computing (Intel SGX/TDX, ARM TrustZone, AMD SEV).Previous (or pending) publications in related domains (confidential computing, distributed computing) are a strong plus.Programming skills in Python, JavaScript/TypeScript, or Java are a plus.You are fluent in English, spoken and written.You will explore how TEEs can be used to provide confidentiality and integrity guarantees for distributed stream processing systems.You will design and implement a protocol or algorithm to provide these guarantees.You will implement a prototype on top of an existing stream processing platform. The processing platform can be an open-source platform or our in-house platform.You will implement benchmarks and evaluate your results.You have the possibility to join a project that might lead to a publication at a top academic venue.
Por favor confirme su dirección de correo electrónico: Send Email
---