Cyberspace is vulnerable to cyberattacks. A major challenge in cyberspace is cyberattack attribution. Cyberattack attribution can be defined as “the matching of a certain attacker or group to a cyber-attack and the consequent unveiling of the real-world actor(s)’ identity.” [1]. The process frequently implies analysing digital traces left by the attackers to discover their identities, motivations, techniques and so on. The traces can range from technical details such as IP addresses, pieces of malware code to broader behavioural patterns and geopolitical contexts.
Broadly speaking, attack attribution mainly presents seven levels:
1) the host originating the attack,
2) intermediary hosts,
3) ISPs through which the attack passes,
4) malicious actors that carry out the attack,
5) institutions that support the attack,
6) political or governmental organisations behind the attack,
7) geo-location of the attack.
In order to effectively combat cybercrime and protect against future attacks, it is essential to accurately attribute these attacks to their sources. Attack attribution is a key issue in many law enforcement operations addressing the forms of cybercrime prioritised in the IOCTA reports, such as data compromise attacks, business email compromise, phishing, social engineering, ransomware and other types of malware.
Accurate cyber attribution can also enable targeted legal, political, and security responses, and function as both a deterrent to potential attackers and a tool for fostering international collaboration on cybersecurity. However, one of the major challenges in cyber attribution is the difficulty of accurately determining the identity of the attackers [2]. Publicly attributing cyberattacks present multiple challenges related to political, legal and technical aspects, but attribution is also particularly difficult because attackers rely on deception to obfuscate their identities [3]. These difficulties arise from various factors such as the use of sophisticated techniques to mask the origin of attacks, the utilization of proxy servers or compromised systems to launch the attacks, and the involvement of skilled individuals or organized groups that operate under pseudonyms or in the dark web. Moreover, cyber criminals often employ tactics such as distributed denial of service attacks or malware infections that can further complicate the process of attribution.
Models for technical attack attribution
Attack attribution models provide a framework that helps analysts follow a methodological approach to perform the attribution. This is the starting point since each organisation has to decide how to analyse a given incident and whether or not an attribution is feasible. Among them, the best known models are: Direct hacker profiling [4][5], Diamond Model[6][7], Q-Model [2], Game-theoretic approach [8] and Cyber Attribution Model (CAM) [4][9].
In particular, the CAM model, shown in the next figure [9], addresses both cyberattack investigation and profiling of cyberthreat actors. Both activities are associated with “technical and socio-political contextual indicators and components” such as victimology, infrastructure, capabilities, and motivation. The components are useful for identifying adversarial TTPs and modus operandi. On the one hand, external threat actor profiles based on available knowledge of previous attacks are used. In parallel, internal investigations are carried out to clearly characterize a given incident. On the other hand, the TTPs determined by the two activities are compared for attribution. This model can also be applied to identify inconsistencies in the attribution process also called as “False flags”.

Figure 1. Cyber Attribution – CAM Model [9]
Artifacts and traces for technical attack attribution
A prerequisite for technical attack attribution is to have knowledge about attack tools and techniques commonly used by attackers and also the understanding of the traces, usually called artifacts, that are left on a victim’s infrastructure, or even elsewhere such as in cloud provider facilities and other third parties. It is of outmost importance to understand how trustworthy the conclusions obtained by analysing these artifacts are. Therefore, the first activity is to gather artifacts and other traces that support the technical attack attribution process.
A comprehensive study that identified the most relevant artifacts and traces that can be collected can be found in [9], where the authors established the simplified attribution process bases on the four following steps:
- Discover the sources that provide basic case-specific artifacts.
- Gather artifacts.
- Derive relevant information from basic data.
- Answer questions that aid attribution.
This study also provides an interesting distribution of artifacts along the kill chain, concept firstly introduced by Lockheed Martin Corporation in [10]. The intrusion kill chains distinguish the seven phases of multi-stage attacks: Reconnaissance, Weaponisation, Delivery, Exploitation, Installation, Command and Control (C2), and Actions on Objectives. In fact, this work has been deeply analysed and completed under the CYBERSPACE project.
Machine Learning for attack attribution
Machine Learning techniques are generally used to detect and identify attacks, but rarely in cyberattack attribution. Some examples of Machine Learning techniques for attack attribution can be found in the works [11][12][13]. However, it must be highlighted the field where Machine Learning techniques have been more widely applied is in malware attribution.
Malware attribution is usually done by experts with time-consuming manual analysis [14]. Such analysis may involve the examination of malware reports and the detection of previously observed patterns. To support such analysis, researchers have proposed algorithms for classification and finding patterns among malware samples using different techniques (e.g., source code examination, reverse engineering, dynamic analysis, etc.) to partially automate the process of analysis and attribution of malware [15][16][17].The studies carried out generally deal with malware attribution by applying clustering or classification techniques, using features extracted earlier with diverse analysis tools. However, it is not trivial to decide which are the most representative features and how to process them to improve the attribution precision. It can be observed that the more advanced approaches are starting to apply NLP techniques.
Conclusion
In CYBERSPACE, one of the main technical objectives is to provide LEAs and other incident response and digital forensics experts with key facts that can assist in relating malware samples to known threats and threat actors. For this aim, the OpenCTI platform [18] has been considered as an important enabler through its CTI storage, correlation and analysis capabilities. Therefore, CYBERSPACE is currently exploring how to extend the OpenCTI platform with technical attribution capabilities and machine learning methods based on malware similarity analysis that can be combined to better support incident investigators and threat researchers in understanding the origins of cyberattacks.
Authors: Mª Carmen Palacios and Ana Ayerbe TECNALIA
References
- Saalbach, K. (2019). Attribution of Cyber Attacks. Information Technology for Peace and Security.
- Rid, T., & Buchanan, B. (2015). Attributing cyber attacks. Journal of strategic studies, 38 (1-2), 4-37. Available at: https://figshare.com/articles/journal_contribution/Attributing_Cyber_Attacks/1284592
- Lindsay, J. R. (2015). Tipping the scales: the attribution problem and the feasibility of deterrence against cyberattack. Journal of Cybersecurity, 1(1), 53-67.
- Pahi, T., & Skopik, F. (2019, July). Cyber attribution 2.0: Capture the false flag. In Proceedings of the 18th European Conference on Cyber Warfare and Security (ECCWS 2019) (pp. 338-345).
- Vernacchia, S. (2018). A practical method of identifying cyberattacks. PWC Middle East.
- Caltagirone, S., Pendergast, A., & Betz, C. (2013). The diamond model of intrusion analysis. Center For Cyber Intelligence Analysis and Threat Research Hanover Md.
- Mavroeidis, V., & Bromander, S. (2017, September). Cyber threat intelligence model: an evaluation of taxonomies, sharing standards, and ontologies within cyber threat intelligence. In 2017 European Intelligence and Security Informatics Conference (EISIC) (pp. 91-98). IEEE.
- Edwards, B., Furnas, A., Forrest, S., & Axelrod, R. (2017). Strategic aspects of cyberattack, attribution, and blame. Proceedings of the National Academy of Sciences, 114(11), 2825-2830.
- Skopik, F., & Pahi, T. (2020). Under false flag: Using technical artifacts for cyber attack attribution. Cybersecurity, 3, 1-20.
- Hutchins, E. M., Cloppert, M. J., & Amin, R. M. (2011). Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains. Leading Issues in Information Warfare & Security Research, 1(1), 80.
- Han, M. L., Han, H. C., Kang, A. R., Kwak, B. I., Mohaisen, A., & Kim, H. K. (2016). Whap: Web-hacking profiling using case-based reasoning. In 2016 IEEE Conference on Communications and Network Security (CNS) (pp. 344-345). IEEE.
- Noever, D., & Kinnaird, D. (2016). Identifying the Perpetrator: Attribution of Cyber-attacks based on the Integrated Crisis Early Warning.
- Qiang, L., Ze-Ming, Y., Bao-Xu, L., & Zheng-Wei, J. (2016). A reasoning method of cyber-attack attribution based on threat intelligence. International Journal of Computer and Systems Engineering, 10(5), 920-924.
- De los Santos, S., Guzmán, A., & Torrano, C. (2018). Android Malware Pattern Recognition for Fraud Detection and Attribution: A Case Study.
- Alrabaee, S., Wang, L., & Debbabi, M. (2016). BinGold: Towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs). Digital Investigation, 18, S11-S22.
- Alrabaee, S., Shirani, P., Wang, L., & Debbabi, M. (2015). Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation, 12, S61-S71.
- Ruttenberg, B., Miles, C., Kellogg, L., Notani, V., Howard, M., LeDoux, C., … & Pfeffer, A. (2014). Identifying shared software components to support malware forensics. In Detection of Intrusions and Malware, and Vulnerability Assessment: 11th International Conference, DIMVA 2014, Egham, UK, July 10-11, 2014. Proceedings 11 (pp. 21-40). Springer International Publishing.
- OpenCTI platform, Available at: https://filigran.io/solutions/products/opencti-threat-intelligence/, Accessed October 2023