Cal-Bridge

UC Santa Barbara

Name: Arpit Gupta
Title: Curate Representative and Longitudinal Broadband Quality Datasets
Description: In 2010, the United Nations recognized internet access as a basic human right, reinforced by the COVID-19 pandemic's spotlight on its importance. However, a 'digital divide' persists, with unequal internet access. Stakeholders are addressing this with subsidy programs, rate regulations, and infrastructure funding like the $42.5 billion BEAD program. Policymakers need precise, granular data on broadband availability, quality, and affordability, down to street addresses. This data informs decisions, like allocating BEAD funds, by answering key questions: (1) where is broadband available? (2) what is the quality of internet service?, and (3) how affordable is it, and how does it relate to socioeconomic factors? Current broadband data is often sparse, unreliable, and reliant on questionable self-reported information from ISPs. This can lead to flawed funding decisions, especially in underserved areas. For example, the FCC's National Broadband map relies on ISP self-reports and lacks insights into broadband quality and affordability. This project research aims to bridge this data gap, empowering policymakers and civil society organizations to tackle digital inequality. Specifically, it aims to curate representative and longitudinal broadband quality datasets. The next paragraph provides more specific details.
A crucial aspect of determining where to establish internet services is the ability to pinpoint underserved areas. These are regions where internet service exists but is either too slow, consistently unreliable, or both. To extract this information accurately, we need longitudinal measurements obtained from representative home networks that faithfully reflect users' broadband experiences. While existing crowdsourced tools like Speedtest by Ookla and Measurement-Lab offer extensive coverage, the data is biased towards well-served users who occasionally encounter connectivity issues. Additionally, data collection is instantaneous rather than longitudinal. To address this, this project will harness our group's ongoing collaborations with various crowdsourced measurement service providers, civil societies, and local governments to facilitate data collection from underrepresented communities. This setup will serve as the foundation for developing and testing new methodologies aimed at reducing the cost of longitudinal data collection. Specifically, this project will explore techniques that utilize machine learning tools to strike a balance between the duration of speed tests and measurement accuracy. Our approach involves analyzing spatial-temporal patterns in the sequence of packets during speed tests to identify instances where the initial part of the sequence offers sufficient information to accurately infer the final result. To train the proposed machine learning model, we plan to utilize the vast repository of publicly accessible speed test data points provided by Measurement-Lab and employ self-supervised learning techniques.
Preferred qualifications: Python programming, basics of machine learning desired but not required


Name: Jonathan Balkind
Title: Learning Models of Efficiency of Processor Components
Description: In architecting new computer processors, one of the first steps we must take is modelling potential designs that we are considering building. While the later stages of this process include writing hardware designs and software simulators, the earlier stages involve using high-level models based on relatively simple equations. Designers can choose the parameters for the components they plan to include (e.g. 64KB of cache, four ALUs, etc) and the high-level models will give metrics of their efficiency (e.g. 10 picojoules of energy consumed per cycle, square micrometres of silicon area, etc). These numbers help us make early-stage decisions before getting our hands dirty. However, they're not completely accurate and so the estimates can be off compared to what we actually end up building.
For this project, we're interested in exploring how to apply machine learning techniques to learn the modelling equations, as well as the level of error that they introduce. We see the potential to perform this early-stage modelling more quickly and accurately than using existing tools, as well as the potential to use them to predict the nature of future technologies' which we don't have access to yet.
Preferred qualifications: Python (or other) for machine learning. Some familiarity with processors might be helpful but learning on-the-job is great too :)


Name: Prabhanjan Ananth
Title: Developing Programming Toolkits for Quantum Cryptography
Description: [none provided]
Preferred qualifications: Solid programming skills. Have decent mathematical background. Strong work ethic and passion towards learning.


Name: Tevfik Bultan
Title: Fuzzing for vulnerability signature generation
Description: Discovering and mitigating software vulnerabilities is a common yet challenging security
problem. One approach to protecting a program from being exploited by attackers is to generate vulnerability signatures that characterize all possible ways vulnerabilities can be exploited. Preconditions and postconditions of vulnerabilities can provide a concise signature that recognizes the specific patterns or inputs associated with particular vulnerabilities. Preconditions define the expected state of a program before a specific function or code segment is executed, while postconditions describe the state after the execution. Given a vulnerability signature one can mitigate a vulnerability by ensuring that values that satisfy the precondition do not reach the vulnerability.
This project will focus on using fuzz testing (a type of automated testing approach) to identify vulnerability signatures, and to validate that, after mitigation, the vulnerability is removed. Fuzz testing searches the input space of a program to find inputs that lead to a security violation. In this project, we will use fuzz testing to characterize the set of input values that can trigger a given vulnerability. Furthermore, after mitigation, we will use fuzz testing to validate that vulnerability is removed.
Preferred qualifications: C programming skills will be necessary, familiarity with fuzz testing and security vulnerabilities would be helpful