Cyber Adaptive Learning System Laboratory (CALSys Lab)

Research Projects

 
To conduct a cutting-edge cyber-threat intelligence research that will impact academia and industry, our lab currently investigates the following areas:

 

Hacker Data Collection: A deep understanding of the adversaries present in online hacker communities will greatly aid proactive cybersecurity, allowing security teams to be ahead of malicious hackers. Thus, we are working here on a multi-component system for cyber-threat intelligence gathering from the darkweb. By building this cyberinfrastructure at Cal Poly Pomona, we focus on collecting malicious hacking-related information from forum discussions and marketplaces product/service offerings to shine persistent light on the emerging technologies and capabilities of cyber-attackers. Other environments such as instant messaging platforms like WhatsApp/Telegram and Discord channels will also be included soon. We want to generate a criminal hacking dataset to power on security intelligent data-driven tools that will contribute to cyber-defense. We plan to make this dataset the largest public searchable criminal repository available to the entire security community.

Project awarded by the NSF Computer and Information Science and Engineering Research Initiation Initiative (CRII) Secure and Trustworthy Cyberspace (SaTC) in 2023.

 

CAPTCHA Solver: To automate data collection from websites, a spider needs to find and download HTML pages. Malicious hacker sites, especially those existing in the Tor network, often use CAPTCHAs to avoid web scrapers.  This project uses image segmentation and deep learning to solve the alphanumeric text embedded in the CAPTCHAS so that our robots can collect malicious hacker data in real-time. 

                                                                                                                                                                

Hacker Site Recommender: Currently, our research team is still manually searching for criminal hacking websites to enlarge our database. This project aims to automate this process by investigating .onion links embedded in forum posts by hackers. We are considering the links' surrounding text, the previous and subsequent messages, and the landing pages to detect whether the listed website is a potential target for scraping.

                                                                                                                                                                

Image Analyzer: During the Web scraping process, we only save encrypted images to avoid any ethical or criminal implications. After encryption, those images lose their real pixel representation and consequently cannot be compared. Thus, we are working on an intermediate object representation of those images before encryption so that they can still be compared to reveal important similarities.

 

Identification of Key-hackers: Malicious hacker communities have participants with different levels of knowledge, and those who want to identify emerging cyber-threats need to scrutinize these individuals to find key cyber-criminals. Those individuals have high hacker skills and influence when compared to the great majority of users present in online hacker communities. As this select group of hackers foment a promising vulnerability exploitation threat market, it forms a natural lens through which security alert systems can look to predict cyber-attacks, and that is why we are interested in finding those key-hackers. 

 


Hacker Engagement Prediction: Hacker communities are continually evolving. Due to social influence effects, values are often transmitted from one person to another and this behavior is also observed among malicious hackers. Holding acknowledged reputation, hackers generally use online platforms to advertise exploits, vulnerabilities, techniques, code samples, and targets and also to recruit individuals for malicious campaigns, attracting low-skill-level individuals who aim to improve their hacking skills. Those influential activities not only expand the hackers’ networks, bringing like-minded collaborators and learners, but also help them to increase revenue. Therefore, this project studies hacker adoption behavior online, using it as crowdsourced sensor to gain insight about future users’ activities that may lead to cyber-attacks such as the purchase of a given exploit.                 


Uncovering Communities of Malware and Exploit Vendors: Online hacker marketplaces have become a central place for cybercriminals to purchase malicious products and services, who take advantage of the numerous offerings provided especially by trustful and well-succeeded hackers. In this context, we want to find communities of vendors with similar hacker expertise for surveillance purposes, helping defenders to anticipate the imminent cyber-criminal activities of those individuals.          
                                                                                                                                        

 

Vendor Alias Attribution: Vendors might use different aliases across multiple marketplaces to avoid being tracked by law enforcement agencies while they sell/offer their exploits and services. In this project, we are deeply analyzing vendor profiles and product offerings to precisely conduct vendor attribution or authorship based on their disguised activities with multiple aliases.


Vulnerability Exploitation Prediction: This project focus on predicting known software vulnerabilities that are likely to be exploited by malicious hackers. As standard vulnerability score systems like CVSS are known not to be useful for patch prioritization, we are exploring AI, machine learning, and SNA (Social Network Analysis) techniques on features computed from malicious hacker environments and security advisories to antecipate vulnerability exploitation.

                                           

Identification of Zero-Day Exploits: Although zero-day exploits target unknown vulnerabilities, they usually exist “in the wild” for over 300 days before identification. This project attempts to mine hacker data to discover what kind of zero-day exploit is being developed and offered online.                                   
                                                                                                                                             
Social Network Extraction of Malicious Hackers: Many works conducted on darkweb forums explore social network features to accomplish their goals. However, due to the fragmented and interactionally disjointed natures of those environments, extracting their social network structure by using only user interaction data is hard. The research lab agenda also includes NLP methods to derive accurate reply-to information, so that a better approximation of the hacker network topology can be inferred.   
                                                                                                                                               
Malicious Viral Cascade Prediction: Predicting when a hacker message will go “viral” is important for cybersecurity, since information cascades can signal hacktivism campaigns or mass adoption of cyber-threats. Then, this research project tries to model information diffusion on hacker communities, investigating which influential and topological patterns can be leveraged for the early extraction of popular threads.