Introduction of Sample Research Projects

Project 1: Privacy Preserving Deep Learning Models

Machine learning models especially deep learning models have been widely applied on various types of data, and will further transform many aspects of our daily lives. However, these popular machine learning algorithms typically were developed without a thorough consideration of data security and privacy issues. When it comes to big data with high volume, variety and velocity, existing data security solutions fall short in efficiency, flexibility, and scalability. These security and privacy challenges either endanger some critical technology to malfunction (e.g., self-driving cars could misclassify a stop sign as a yield sign with the computer vision learning model under adversarial attack) or undermine the promising future that big data and machine learning can bring us (e.g., with unsolved privacy issues, human genome-based personalized medicine and medical practice will be delayed).

In this project, our goal is to solve the data privacy issues when a deep learning model is trained in a cloud. We aim to design and develop a new protocol which provides privacy preservation for the training and test input data, under the semi-honest security model, and guarantees the accuracy and efficiency of predictionl at a reasonable level. In this project, Keras/Tensorflow, and the crytographic library SEALs are used. 

Faculty Advisor: Dr. Tingting Chen

Project 2: Adversarial Machine Learning and Defense

Since the advent of machine learning to the field of computer vision, image classification software has surpassed human capabilities and enabled new technologies including facial recognition authentica- tion, self-driving cars, and smart security cameras. However, a unique challenge threatens these technologies: the existence of images which appear normal to humans, but reliably fool image classifiers. The carefully manipulated adversarial samples are called adversarial examples. In this project, we aim to explore different ways of generating adversarial examples, and study the countermeasures to defense againt such attacks. 

Faculty Advisor: Dr. Hao Ji, Dr. Tingting Chen. 

Project 3: GPU-accelerated Encryptions and Decryptions of Big Data

Big data processing and analysis algorithms for security protection and privacy preservation encounter new challenges compared to conventional ones, such as its huge volume and unit size. One challenge in cryptography-based approaches for big data privacy protection is efficiency. Established cryptographic algorithms usually require large key sizes (e.g., 1024 to 2048 bits) to ensure a certain level of security, and thus the time to encrypt and decrypt is significant compared to other light-weight privacy protection approaches.

In this research project, our objective is to utilize Graphic Processing Units (GPU) to speed up the encryption and decryption of big data. This work is meaningful in the environment involving a cloud, where data owners encrypt their data before storing it in the cloud, and the cloud processes queries on the encrypted data. The REU students will learn to program on the parallel programming and computing platform, CUDA which enables general purpose processing on GPU. Then they will apply the skills to accelerate cryptographic algorithms (e.g., Paillier and El Gamal) for different computation tasks.

Faculty Advisor: Dr. Tingting Chen

Project 4: Privacy Preserving Big Data Matching Among Medical Institutions

In biomedical research, collaborations among multiple medical institutions are often needed due to the large quantity and distributed nature of the available data. However, each institution is concerned with the privacy of their data. We aim to provide privacy preserving data matching algorithms for potential collaborators to find common features/interests in their data and establish joint workforce.

We plan to utilize homomorphic cryptographic schemes to enable the private matching in the ciphertext space without the help of a third party. REU students, with guidance, will complete this research project in three steps: 1) Split the homomorphic cryptographic scheme to two programs, and let each medical institution use one program, 2) Design algorithms running on the two medical institutions to jointly finish the matching task, with rounds of communication but no private information revealed. 3) If time allows, extend the algorithm to support matching among more than two parties.

Faculty Advisor: Dr. Tingting Chen

Project 5: Protect Individual Patients’ Privacy in Genomic Data Sharing

In the healthcare setting, privacy issues arise when individual patients share their genomic data with their doctors or to a third party, e.g., in order to have their DNA sequenced. The goal of this project is to design technical solutions to transform individual patients’ genomic data, either to cryptographic ciphertext or to pseudo-anonymous forms. For cryptographic approaches, we need to strike a balance between the functions that can be performed on the ciphertexts and the level of privacy that the ciphertext maintains. For the anonymization approaches, challenges are to defend against re-identification attacks and privacy leakage due to the use of publicly available information, such as genealogy databases.

We plan to guide the REU students to apply the multi-level threshold secret sharing technique on genome data, before storing on/sharing with a third party.  Our previous work on this topic in the general cloud setting lays a good foundation to start with.

Faculty Advisor: Dr. Tingting Chen

Scalable and usable privacy-preserving federated search (PPFS) in cloud environments

In a cloud computing environment such as Hadoop with MapReduce, we focus on a single, but critical operation of the computation problem: private set intersection; i.e., finding the common data items in two or more datasets owned by different parties without revealing any sensitive data items in the rest of the dataset.  An application of such extensions will be, for example, if Department of Justice and Department of Homeland Security want to find a common group of people in their respective databases with specific traits in their profiles, but only by revealing partial information about each person of the dataset. Such operations on dataset are often referred to as privacy-preserving federated search (PPFS).

In the following two projects, project 4 and project 5, we explore the innovative design and development of security extensions of Hadoop (MapReduce/HBase) platform to enable privacy preserving federated search environments. These projects are exciting opportunities to learn the MapReduce and HBase platform and apply cryptographic algorithms to the platform to design and develop the extension.

Project 6: Design and Implementation of Privacy-Preserving Computation Layer:

In this project, the students will extend the MapReduce framework to enable privacy-preserving computation. The fundamental design question is where to place the critical components of the framework i.e. the JobTracker as it controls all aspects of the execution of a job. The extended framework will require functionalities such as being able to efficiently distribute the workload to TaskTrackers based on the sensitivity of the data it is handling.

Faculty Advisor: Dr. Mohammad I. Husain

Project 7: Design and Implementation of Privacy-Preserving Storage Layer

In this project, the students will design and implement an extended version of HBase so it can seamlessly process hybrid datasets containing both public and private (sensitive information) data. Ideally, the DHS private cloud and the public/other agency cloud will have unmodified version of the HBase, which will be communicating with the federated cloud using the extended version of the HBase. The federated cloud will act as a Proxy that communicates with the storage managers of both parties and coordinate the processing of public and private data.

Faculty Advisor: Dr. Mohammad I. Husain

Project 8: Measuring Security Compliance in Hybrid-Cloud Environments

Hybrid clouds consist of on-site, private-cloud, and public-cloud computing resources. Creating a set of security policies and technical controls that allow for consistent cybersecurity compliance across differing technologies is a challenge.  This challenge is exacerbated when each of the computing landscapes is managed by separate entities with potentially inconsistent goals.  Public-cloud operators may focus on optimizing scalability while the owner of the data may wish to optimize privacy and security controls.

In this research, our objective is to explore methods to independently measure, assess and report the cybersecurity posture of the same systems in different operating conditions (in a private cloud vs. public cloud).  Of particular importance will be assessing security compromises that occur due to upgrades and changes in public-cloud environments, which are typically out of the view of the data owners.  In this project, students will focus on three objectives: 1) Research compliance literature and create a wireframe mockup for a compliance dashboard. 2) Select five systems within the Student Data Center and develop compliance metrics and testing tools for the five systems measuring compliance via both internal configurations and external examination (i.e. pentesting). 3) Use open source dashboard software and the wireframe mockup to implement a compliance monitoring system that reports the outcomes of step 2 in real-time.

Faculty Advisor: Dr. Ronald E. Pike

Project 9: Measuring Security Compliance when Systems Migrate to a New Platform

Systems are sometimes moved from a local computing platform to a previously untested public-cloud platform or from one public-cloud platform to another.  This issue is especially relevant during a disaster when systems may be moved to a new cloud platform with or without the data owners knowledge, and while the focus of attention may be diverted.

When systems are moved to a new cloud platform internal compliance controls are typically not useful as the supporting systems differ from the initial design.  However, external controls (pentesting etc.) can still provide a measure of compliance evaluation.  In this project, students will focus on three objectives: 1) Enhance external testing to map compliance objectives to external outcomes allowing a failed external test to return a set of likely compliance issues that need to be resolved. 2) Create a solution resolution process with API handles/calls to alert external trouble resolution systems (SIEM or network management solutions) to ensure that compliance systems failures are reported to appropriate entities. 3) Develop APIs to connect to an external log aggregation service to ensure that all system logs are preserved.

Faculty Advisor: Dr. Ronald E. Pike

Project 10: Measuring and Maintaining Security Compliance in Live Cloud Migration

A great deal of work is currently being done in network function virtualization including live migration of systems between private and public cloud environments. During a live migration, system uptime is maintained while the system is being moved.  Changes within the software-defined network forward data to the new target system but the data may be channeled back to the initial system if the required functionality is not yet available.

In this research project, our objective will be to model current best practices in terms of live migration of systems and develop a process to assess compliance during the live migration process.  In this project, students will focus on three objectives: 1) Determine and examine best-practice methodologies for live migration. 2) Develop a compliance monitoring procedure for assessing compliance assessment during the migration process. 3) Develop API handles/calls to interface with trouble resolution and external logging systems.

Faculty Advisor: Dr. Ronald E. Pike