tfjmp.github.io/index.json at master · tfjmp/tfjmp.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75


    [{"authors":null,"categories":null,"content":"body {text-align: justify}\rThomas Pasquier is an Assistant Professor in the Department of Computer Science at the University of British Columbia. He is affiliated with the UBC Security \u0026amp; Privacy Group and the Systopia Lab, UBC’s systems research group. His work focuses on the design and implementation of computer systems that are inherently observable and transparent. His research interests include digital provenance, system auditing and accountability, intrusion detection, and performance optimization.\nThomas Pasquier est professeur adjoint au Département d’informatique de l’Université de la Colombie-Britannique (UBC). Il est affilié au Security \u0026amp; Privacy Group de l’UBC et au Systopia Lab, le groupe de recherche en systèmes de l’UBC. Ses travaux portent sur la conception et la mise en oeuvre de systèmes informatiques intrinsèquement observables et transparents. Ses intérêts de recherche incluent la provenance numérique, l’audit et la responsabilité des systèmes, la détection d’intrusion et l’optimisation des performances.\n","date":-62135596800,"expirydate":-62135596800,"kind":"term","lang":"en","lastmod":1761922920,"objectID":"2525497d367e79493fd32b198b28f040","permalink":"","publishdate":"0001-01-01T00:00:00Z","relpermalink":"","section":"authors","summary":"body {text-align: justify}\rThomas Pasquier is an Assistant Professor in the Department of Computer Science at the University of British Columbia. He is affiliated with the UBC Security \u0026 Privacy Group and the Systopia Lab, UBC’s systems research group. His work focuses on the design and implementation of computer systems that are inherently observable and transparent. His research interests include digital provenance, system auditing and accountability, intrusion detection, and performance optimization.\nThomas Pasquier est professeur adjoint au Département d’informatique de l’Université de la Colombie-Britannique (UBC). Il est affilié au Security \u0026 Privacy Group de l’UBC et au Systopia Lab, le groupe de recherche en systèmes de l’UBC. Ses travaux portent sur la conception et la mise en oeuvre de systèmes informatiques intrinsèquement observables et transparents. Ses intérêts de recherche incluent la provenance numérique, l’audit et la responsabilité des systèmes, la détection d’intrusion et l’optimisation des performances.","tags":null,"title":"Thomas Pasquier","type":"authors"},{"authors":["T Prasad","R Vora","SY Lim","NP Phong","T Pasquier"],"categories":null,"content":"","date":1769990400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1771516783,"objectID":"073fb4c515d9106235cb2d07245c42e5","permalink":"https://tfjmp.org/publication/2026-madweb/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2026-madweb/","section":"publication","summary":"Third-party advertising and tracking (A\u0026T) are pervasive across the web, yet user exposure varies significantly with browser choice, browsing location, and hosting jurisdiction. We systematically study how these three factors shape tracking by conducting synchronized crawls of 743 popular websites from 8 geographic vantage points using 4 browsers and 2 consent states. Our analysis reveals that browser choice, user location, and hosting jurisdiction each shape tracking exposure in distinct ways. Privacy-focused browsers block more third-party trackers, reducing observed A\u0026T domains by up to 30% in permissive regulatory environments, but offer smaller relative gains in stricter regions. User location influences the tracking volume, the prevalence of consent banners, and the extent of cross-border tracking: GDPR-regulated locations exhibit about 80% fewer third-party A\u0026T domains before consent and keep 89–91% of A\u0026T requests within the EEA or adequacy countries. Hosting jurisdiction plays a smaller role; tracking exposure varies most strongly with inferred user location rather than where sites are hosted. These findings underscore both the power and limitations of user agency, informing the design of privacy tools, regulatory enforcement strategies, and future measurement methodologies.","tags":null,"title":"RegTrack: Uncovering Global Disparities in Third-party Advertising and Tracking","type":"publication"},{"authors":["T Bilot","T Pasquier"],"categories":null,"content":"","date":1757030400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1757087384,"objectID":"3932bea174ee6be85f928626b2ebd62d","permalink":"https://tfjmp.org/publication/2025-usenix-login/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2025-usenix-login/","section":"publication","summary":"In recent years, researchers have turned to provenance-based intrusion detection systems (PIDSs) as a promising way to spot attacks that evade past traditional defenses. At their core, these systems build provenance graphs, which act like detailed maps of how information flows through a computer. Such graphs treat system entities (e.g., processes, files, and network connections) as nodes, and the interactions between them (e.g., system calls) as edges. By analyzing these graphs, anomaly-based PIDSs learn what *normal* behavior looks like and then flag unusual activity, making them well-suited for catching stealthy attacks such as advanced persistent threats (APTs) or previously unknown zero-day exploits. Despite claims of near-perfect detection rates, today's PIDSs are nowhere near ready for real-world use. Their biggest flaw is how they report results: most state-of-the-art PIDSs generate coarse-grained alerts with tens of thousands of nodes or events, burying analysts under mountains of noise. This is not just an engineering oversight; it is the direct result of evaluation practices. By optimizing for specific evaluation metrics instead of usable outputs, the community has built detectors that are impressive on paper but fall short of being helpful to a security team.","tags":null,"title":"Toward Practical and Usable Provenance-based Intrusion Detection Systems","type":"publication"},{"authors":["T Bilot","B Jiang","Z Li","N El Madhoun","K Al Agha","A Zouaoui","T Pasquier"],"categories":null,"content":"  ","date":1747699200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1770063916,"objectID":"f74cdbe0380f911fe10e2d6a70d57ae8","permalink":"https://tfjmp.org/publication/2025-usenixsec-2/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2025-usenixsec-2/","section":"publication","summary":"Provenance-based intrusion detection systems (PIDSs) have garnered significant attention from the research community over the past decade. Although recent studies report near perfect detection performance, we show that these systems are not viable for practical deployment. We implemented eight state-of-the-art systems within a unified framework and identified nine key shortcomings that hinder their practical adoption. Through extensive experiments, we quantify the impact of these shortcomings using cybersecurity-oriented metrics and propose solutions to address them for real-world applicability. Building on these insights, we demonstrate that most existing systems add unnecessary complexity, whereas a simple neural network achieves state-of-the-art detection on five of seven DARPA datasets while offering a lighter, faster, and real-time detection solution. Finally, we highlight critical open research challenges that remain unaddressed in the current literature, paving the way for future advancements.","tags":null,"title":"Sometimes Simpler is Better: A Comprehensive Analysis of State-of-the-Art Provenance-Based Intrusion Detection Systems","type":"publication"},{"authors":["B Jiang","T Bilot","N El Madhoun","K Al Agha","A Zouaoui","S Iqbal","X Han","T Pasquier"],"categories":null,"content":"  ","date":1736640000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1770063916,"objectID":"fbcc5da5c0c98744051c5293c427438b","permalink":"https://tfjmp.org/publication/2025-usenixsec/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2025-usenixsec/","section":"publication","summary":"Past success in applying machine learning to data provenance graphs -- a structured representation of the history of operating system activities -- to detect host system intrusions has fueled continued interest in the security community. Recent solutions, particularly anomaly-based approaches using graph neural networks (GNNs) to detect previously unknown attacks, have reported near-perfect accuracy. Surprisingly, despite this high performance, the industry remains reluctant to adopt these intrusion detection systems (IDSs). We identify Quality of Attribution (QoA) as the key factor contributing to this disconnect. QoA refers to the amount of effort required from a human analyst to investigate an IDS's detection output, uncover the root causes of an attack, understand its ramifications, and dismiss potential false alarms. Unfortunately, prior work often generates large volumes of low-QoA output, much of which is irrelevant to attack activities, leading to alert fatigue and analyst burnout. We introduce Orthrus, the first IDS to achieve high-QoA detection on data provenance graphs at the node level. Orthrus detects malicious hosts using a GNN encoder designed to capture the fine-grained spatio-temporal dynamics of system events. It then reconstructs the attack path through dependency analysis to ensure high-QoA detection. We compare Orthrus against five state-of-the-art IDSs. Orthrus reduces the number of nodes requiring manual inspection for attack attribution by several orders of magnitude, significantly easing the burden on security analysts while achieving strong detection performance.","tags":null,"title":"ORTHRUS: Achieving High Quality of Attribution in Provenance-based Intrusion Detection Systems","type":"publication"},{"authors":["T Abrar","A Shamail","M J Iqbal","M Iqbal","A Zouaoui","A Ahmed","M Abdullah","M Shayan","F Zaffar","T Pasquier","D Eyers","A Gehani"],"categories":null,"content":"","date":1736553600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1751429854,"objectID":"0c5f49d8c85ab42bd8c65cb34b9bddfa","permalink":"https://tfjmp.org/publication/2025-rep/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2025-rep/","section":"publication","summary":"As cyber-threats grow in scale and sophistication, intrusion detection systems that incorporate system provenance and deep learning have emerged as a promising direction for detecting advanced persistent threats (APTs). We endeavor to reproduce the experimental results from eight such systems published over the past four years in top-tier research venues.  We encountered numerous challenges that obstruct reproducibility, including incomplete or non-functional source code releases, missing documentation, unavailability of datasets or detailed preprocessing steps, and unclear or inconsistent descriptions of experimental procedures. We detail and categorize these challenges to demonstrate the obstacles researchers may encounter when reproducing studies in this domain. Our findings highlight gaps in reaching the ideals of open science in this area of intrusion detection research.","tags":null,"title":"On the Reproducibility of Provenance-based Intrusion Detection that uses Deep Learning","type":"publication"},{"authors":["SY Lim","T Prasad","X Han","T Pasquier"],"categories":null,"content":"  ","date":1729209600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1747445240,"objectID":"efd89ba7f10d5aec14ef34ab361d778b","permalink":"https://tfjmp.org/publication/2024-ccsw/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2024-ccsw/","section":"publication","summary":"The eBPF framework enables execution of user-provided code in the Linux kernel. In the last few years, a large ecosystem of cloud services has leveraged eBPF to enhance container security, system observability, and network management. Meanwhile, incessant discoveries of memory safety vulnerabilities have left the systems community with no choice but to disallow unprivileged eBPF programs, which unfortunately limits eBPF use to only privileged users. To improve run-time safety of the framework, we introduce SafeBPF, a general design that isolates eBPF programs from the rest of the kernel to prevent memory safety vulnerabilities from being exploited. We present a pure software implementation using a Software-based Fault Isolation (SFI) approach and a hardware assisted implementation that leverages ARM’s Memory Tagging Extension (MTE). We show that SafeBPF incurs up to 4% overhead on macrobenchmarks while achieving desired security properties.","tags":null,"title":"SafeBPF: Hardware-assisted Defense-in-depth for eBPF Kernel Extensions","type":"publication"},{"authors":["X Cao","S Patel ","SY Lim","X Han","T Pasquier"],"categories":null,"content":"  ","date":1720569600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1747445240,"objectID":"446d1c959fdf708854dae6e7bf9e4557","permalink":"https://tfjmp.org/publication/2024-atc/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2024-atc/","section":"publication","summary":"Monolithic operating systems are infamously complex. Linux in particular has a tendency to intermingle policy and mechanisms in a manner that hinders modularity. This is especially problematic when developers aim to finely optimize performance, since it is often the case that a default policy in Linux, while performing well on average, cannot achieve the optimal performance in all circumstances. However, developing and maintaining a bespoke kernel to satisfy the need of a specific application is usually an unrealistic endeavor due to the high software engineering cost. Therefore, we need a mechanism to easily customize kernel policies and its behavior. In this paper, we design a framework called FetchBPF that addresses this problem in the context of memory prefetching. FetchBPF extends the widely used eBPF framework to allow developers to easily express, develop, and deploy prefetching policies without modifying the kernel codebase. We implement various memory prefetching policies from the literature and demonstrate that our deployment model incurs negligible overhead as compared to the equivalent native kernel implementation.","tags":null,"title":"FetchBPF: Customizable Prefetching Policies in Linux with eBPF","type":"publication"},{"authors":["N Boufford","J Wonsil","A Pocock","J Sullivan","M Seltzer","T Pasquier"],"categories":null,"content":"","date":1718668800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1712333431,"objectID":"d0026bef20e75a79610d539758575a1c","permalink":"https://tfjmp.org/publication/2024-rep/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2024-rep/","section":"publication","summary":"Scientists use complex multistep workflows to analyze data. However, reproducing computational experiments is often difficult as scientists’ software engineering practices are geared towards the science, not the programming. In particular, reproducing a scientific workflow frequently requires information about its execution. This information includes the precise versions of packages and libraries used, the particular processor used to perform floating point computation, and the language runtime used. This can be extracted from data provenance, the formal record of what happened during an experiment. However, data provenance is inherently graph-structured and often large, which makes interpretation challenging. Rather than exposing data provenance through its graphical representation, we propose a textual one and use a large language model to generate it. We develop techniques for prompting large language models to automatically generate textual summaries of provenance data.We conduct a user study to compare the effectiveness of these summaries to the more common node-link diagram representation. Study participants are able to extract useful information from both the textual summaries and node-link diagrams. The textual summaries were particularly beneficial for scientists with low computational expertise. We discuss the qualitative results from our study to motivate future designs for reproducibility tools.","tags":null,"title":"Computational Experiment Comprehension using Provenance Summarization","type":"publication"},{"authors":["Z Cheng","Q Lv","J Liang","Y Wang","D Sun","T Pasquier","X Han"],"categories":null,"content":"","date":1716163200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1747445240,"objectID":"5466234391b8345cd53c413ba96ef425","permalink":"https://tfjmp.org/publication/2024-sp/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2024-sp/","section":"publication","summary":"Provenance graphs are structured audit logs that describe the history of a system's execution. Recent studies have explored a variety of techniques to analyze provenance graphs for automated host intrusion detection, focusing particularly on advanced persistent threats. Sifting through their design documents, we identify four common dimensions that drive the development of *provenance-based intrusion detection systems* (PIDSes): **scope** (can PIDSes detect modern attacks that infiltrate across application boundaries?), **attack agnosticity** (can PIDSes detect novel attacks without a priori knowledge of attack characteristics?), **timeliness** (can PIDSes efficiently monitor host systems as they run?), and **attack reconstruction** (can PIDSes distill attack activity from large provenance graphs so that sysadmins can easily understand and quickly respond to system intrusion?). We present KAIROS, the first PIDS that simultaneously satisfies the desiderata in all four dimensions, whereas existing approaches sacrifice at least one and struggle to achieve comparable detection performance. Kairos leverages a novel graph neural network-based encoder-decoder architecture that learns the temporal evolution of a provenance graph's structural changes to quantify the degree of *anomalousness* for each system event. Then, based on this fine-grained information, Kairos reconstructs attack footprints, generating compact *summary graphs* that accurately describe malicious activity over a *stream* of system audit logs. Using state-of-the-art benchmark datasets, we demonstrate that Kairos outperforms previous approaches.","tags":null,"title":"Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance","type":"publication"},{"authors":["Z Cheng","Q Lv","J Liang","Y Wang","D Sun","T Pasquier","X Han"],"categories":null,"content":"","date":1716076800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1747445240,"objectID":"59f84a9dd76a7513ae3bad63a8046baa","permalink":"https://tfjmp.org/publication/2024-sp-supp/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2024-sp-supp/","section":"publication","summary":"This document is a companion contain materials supplementary to our paper published in the 43rd IEEE Symposium on Security and Privacy S\u0026P 2024.","tags":null,"title":"Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance (Supplementary Material)","type":"publication"},{"authors":["Thomas Pasquier"],"categories":["Advising"],"content":"The submission of your thesis proposal should normally occur in the second year of the PhD program and before the end of the third year. You can find more information about the process on the department website. The process itself is clearly detailed, but the expected content of the thesis proposal itself is underspecified. This page describes my expectations for the students I supervise.\nFormatting and Length I strongly recommend to use the following latex template. Remember, that as per department rules, your thesis proposal must not exceed 50 pages (excluding appendices). You should also update and remove part of the front matter from the template (e.g., list of figures, chapter list etc.). The page limit is a maximum; it is not a target. Successful Thesis Proposal documents can be shorter than the limit. Writing within the page limit is important. You should add all submissions and accepted publications since the start of your PhD in the appendix.\nContent Summary This should be 1 or 2 pages long and give a brief summary of the research you are intending to pursue. The early part of your summary, should contain your thesis statement (more examples here). The thesis statement is central to the proposal and you should build the rest of the document around it. Your goal is to demonstrate that your proposed thesis is non-trivial, novel, plausible and, importantly, useful.\nYou can later subdivide your thesis statement into multiple research questions, but you must ensure that they are clearly interconnected, and that they will lead to a coherent narrative throughout your thesis. You should also discuss why answering those questions is important.\nLiterature review The goal in this section is to demonstrate your understanding of the literature, and to show the limitations of previous work. This should be the basis to your claim of novelty, and it should be clear how the proposed work fits within the existing literature. The literature review should provide a critical assessment of past work, including:\n the identification of foundational work in the topic area; the most closely related prior work; a clear discussion of their strengths and limitations.  You should consider that this part of the proposal will be used as a chapter in your thesis.\nProgress Report You should have done preliminary research by the time you are submitting your proposal and, ideally, have published some work based on your RPE. The goal is to show the committee what you are capable of. This helps the committee assess the plausibility of the thesis and of your proposed plan. Published papers should be available in the appendix, and you do not need to reproduce their content. You should summarize them briefly in a self-contained way.\nYou should also consider listing talks you have given, internships or any other relevant activities.\nResearch Proposal You should build from your proposal summary and discuss how you are planning to tackle your thesis and your research questions over the next few years. It may be useful to think of this in term of planned publications. You could subdivide the planned research into multiple chunks. Each chunk could be summarized into 3 or 4 paragraphs and corresponds to an academic paper. Those papers will form the basis for your thesis chapters. At this stage you should plan for at least 3 full academic papers.\nPlan and Timeline While your proposal should give the impression that failure of your research is unlikely, in reality this possibility exists. Indeed, it would not be research if failure was not possible. Consequently, your plan must account for possible setbacks and failures, and must discuss what you plan to do if something does not work out. Your plan must also contain milestones as well as there completion and success criteria. A milestone, can be, for example, the submission of a paper, the release of a dataset, the completion of some software development task etc. You may also discuss the evaluation strategies you will adopt to measure success (e.g., your system successfully prevent vulnerability X while adding less than Y% overhead.) This list of milestones should clearly and ultimately lead to the submission and defense of your thesis.\nThe milestones you present are your best guesses at the time of your proposal. You will not be held to them, but you should periodically refer to your schedule, update it as necessary, and become better at estimating how long it will take to complete your work. They are mostly there to demonstrate that you understand the timescale involved in overcoming different research challenges (e.g., paper submissions must be scheduled at a reasonable and plausible pace).\nFinally, if you need specialized hardware or specific software resources to complete your research, this must be discussed in advance and clearly stated in your thesis.\nProposal Submission You should plan a couple of months to work on your proposal. You should be ready to send and discuss regular updates …","date":1715212800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1717775972,"objectID":"0640678bdcaa3a4c83e264894f8d0abc","permalink":"https://tfjmp.org/post/phd-proposal/","publishdate":"2024-05-09T00:00:00Z","relpermalink":"/post/phd-proposal/","section":"post","summary":"On this page, I discuss my expectations regarding the PhD proposal.","tags":["Academic"],"title":"The Thesis Proposal","type":"post"},{"authors":["SY Lim","X Han","T Pasquier"],"categories":null,"content":"","date":1694304000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1747445240,"objectID":"53f749c6292e6d95921afcb3e95cc71f","permalink":"https://tfjmp.org/publication/2023-sigcomm-ebpf/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2023-sigcomm-ebpf/","section":"publication","summary":"For safety reasons, unprivileged users today have only limited ways to customize the kernel through the extended Berkeley Packet Filter (eBPF). This is unfortunate, especially since the eBPF framework itself has seen an increase in scope over the years. We propose SandBPF, a software-based kernel isolation technique that dynamically sandboxes eBPF programs to allow unprivileged users to safely extend the kernel, unleashing eBPF’s full potential. Our early proof-of-concept shows that SandBPF can effectively prevent exploits missed by eBPF’s native safety mechanism (i.e., static verification) while incurring 0%-10% overhead on webserver benchmarks.","tags":null,"title":"Unleashing Unprivileged eBPF Potential with Dynamic Sandboxing","type":"publication"},{"authors":["Thomas Pasquier"],"categories":["Course"],"content":"Course Description The goal of this course is to expose students to a variety of topics in Systems Security. Security inherently touches on all areas of computer science. Therefore, this course was designed as a breadth course, addressed to all students in the department. The core idea underlying the course is to bring together a diversity of viewpoints to generate interesting discussions. On the other hand, we could also easily design a depth course focused on any of the topics we will discuss. The project component is the opportunity for students to explore one of those topics in more depth. Students are free (and encouraged) to apply their expertise (ML, PL, HCI, architecture etc.) to solve a specific Security problem. Some of the papers have been selected explicitly to highlight the interdisciplinary nature of Systems Security and to showcase how diverse perspectives are welcomed and appreciated.\nCourse Requirements There is no specific pre-requisite for this course outside of an undergraduate degree in Computer Science or closely related topics.\nCourse Objectives  reason about security problems; learn to read, critique, and write security papers; better understand the review process; implement and evaluate a security prototype.  Class format This is a seminar-type class. Every class we will discuss a different paper. I have selected a mix of recent and older papers. There will be two presenters during each class. Each presenter will play a different role: the role of the Advocate and the role of the Critic. The Advocate should play a role similar to that of the original authors and try to sell the work to the audience. On the other hand, the Critic while remaining objective should towards the end of the presentation highlight the flaw of the paper and convince the audience that the paper is not good. The Advocate presentation will last 20-25 minutes, you should motivate the work, summarize the paper, and present the results. The Critic presentation will last 10 minutes, you do not need to cover motivation or summarization, instead you should focus on the shortcomings of the paper. While shorter, the critical presentation is probably harder to prepare.\nIn order to do well during your presentation, you should remember to stick to your role (Advocate or Critic). Further, you do not need to spend too much time explaining the basics of the paper, everyone in the class will have already read it. What add value to the presentation is your opinion and the insights you can extract from the paper! This is what you should focus on.\nYou should expect to present at least 2 or 3 times during the term depending on the number of students registered. After the presentation, we will take a 10 minutes break and discuss the paper. You should come prepared for those discussions and be ready to engage. Submissions are to be made on canvas, unless specified otherwise.\nPaper reports For each assigned paper you must write a report. You are to use the USENIX latex template for formatting. You must submit your reports on Canvas. In your report, please, follow this structure:\n Paper Summary (no more than 250 words) Provide a brief summary of the paper (3-5 sentences is usually enough). The aim is to demonstrate that you’ve read (and understood) the paper, so try to paraphrase and extract the essentials. At this stage you should aim to be objective; later sections allow for your own opinion.\nAnswer the following (no more than 750 words in total) The Problem What is the problem? Why is it important? Why is previous work insufficient (or Why has the problem not been solved before, e.g. it’s a new problem the authors have identified). This is your take on what the authors say in the paper (so again should be fairly objective). If the paper doesn’t seem to tackle a particular problem, then focus on the primary motivation for the work. 1-2 sentences for each of the three questions is probably sufficient.\nThe Solution (or Approach) What is their approach/solution? How does it solve the problem? How is the solution unique and/or innovative (if it is)? What are the details? Once more you should use the paper itself as the source to help you answer these questions– but, as in previous parts, please do not just copy sections from the paper. Instead, you should focus on paraphrasing/synopsizing, and extracting the essential details. Depending on the paper, you’ll probably need 5-10 sentences here.\nEvaluation How do they evaluate their solution? What questions do they set out to answer? What does the evaluation say about the strengths and weaknesses of their system? What are the strengths and weaknesses of the evaluation itself do you think? A total of 3-4 sentences should suffice here – we’re looking for highlights, not a point-by-point reproduction of the evaluation section(s). In the rare case that there is no evaluation section, skip this part of the report.\nQuestions for the Authors Imagine you’re attending a talk about this paper given …","date":1654128000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1682736608,"objectID":"339f4ed1d6cdf6db844586519aa89c4f","permalink":"https://tfjmp.org/post/2022-538p/","publishdate":"2022-06-02T00:00:00Z","relpermalink":"/post/2022-538p/","section":"post","summary":"Curriculum for [538P] Topic in Computer Systems: Systems Security (2022-2023).","tags":["Academic"],"title":"[538P] Topic in Computer Systems: Systems Security (2022-2023)","type":"post"},{"authors":["Thomas Pasquier"],"categories":["Advising"],"content":"The official guidelines are provided on the department website. On this page, I give my take on the process and provide some pointers.\nThe goal of the RPE The purpose of the RPE is to 1) make sure you are ready for research; and 2) that the supervisory relationship is working. It is best to identify problems early, and the goal is not to fail you but to ensure you are ready to succeed. In my opinion, the RPE should be a well-defined, reasonably scoped, and self-contained project. The RPE is not your Ph.D., you should think of it as starting to work on your first paper (at UBC) and it should be scoped consequently. You may have a grand ambition for your Ph.D., but you need to wrap the RPE project in only a few months. Consequently, you need to identify a set of clear and meaningful objectives where you can make significant progress within the available time frame. There is an expectation that your RPE will lead to a publication, but this is not expected to happen before your defense. However, you need to show you are working in that direction (e.g., by showing preliminary results and a prototype). Finally, The self-contained aspect is important if you are working as part of a larger project, the RPE must be based on your work and your contribution must be clearly identifiable. My role is to work with you, support you, and make sure things go smoothly.\nDeliverable For written reports, I suggest using the USENIX template.\nRPE proposal: expected length is around two pages. This should focus on the problem definition. However, from experience, most students submit slightly longer proposals (5-6 pages) including an expanded related work section and a description of their proposed solution. This is not necessary, but you should feel free to include this if you want feedback on those aspects.\nRPE report: expected length is around twelve pages. Your report should be written and organized like a conference paper. You may consider including a long discussion section containing: 1) limitations of your current prototype/proof of concept; 2) work you are planning to do over the next few months to turn your RPE work into a publication.\nRPE presentation: expected length is around twenty minutes. You should present your work as you would at a conference. You should cover the following topics: 1) context and problem; 2) solution; 3) evaluation; and 4) future work. You should rehearse your presentation (you can ask me and your peers).\nYou should not hesitate to discuss with other students in the lab, and ask for examples of their submissions. Further, I am expecting to see several drafts of your written work/presentation. I will normally ask for it, but you should feel free to send it to me as soon as you want feedback. Unlike work you do for a joint paper submission, I will avoid rewriting your text. However, I am more than happy to (and will) comment. I also encourage you to share drafts with some of your peers working on different projects (forming a writing group with other students taking their RPE at the same time can be a very positive experience). Sharing your work with peers help you gauge how accessible your writings are.\nDeadlines Assume a student starting in September (if you start in January, tweak the timeline accordingly). Here is my vision of how you should approach the RPE process. Remember, as your advisor/supervisor it is my job to get you through this process successfully. This means you should talk to me and seek help and support.\nSeptember, you hopefully have an idea of why you came to UBC and what you want to work on. You should spend your first few months getting familiar with the relevant literature, writing an annotated bibliography/notes about your reading, and start designing and implementing prototypes to test your ideas. In addition, you may also decide to join an existing project, get up to speed with the project, and participate actively. During our 1-1 meeting I would expect to discuss your growing understanding of the field, and your ideas. Part of my job is to challenge your ideas and force you to think critically about the problems you propose to investigate and potential solutions. Defining research problems is one of the most important skills to acquire early on.\nDecember, you should start crystalizing the research problem you want to tackle during your RPE. Identify, the relevant literature and what is the state of the art in that space.\nJanuary, this is the time to start working on the proposal. In addition to formalizing and writing down your proposal, you should consider doing some preliminary technical experimentation. My advice is to take only a single course during this term. You want to ensure you have enough time dedicated to research.\nMarch, we select together your RPE committee. This is relatively simple, we need to identify two faculties such that their interests overlap with your RPE project and who have time to seat on your committee.\nApril, you submit your proposal and get …","date":1652572800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1652925499,"objectID":"55fd25ba35946e89b8ef51ec05f81277","permalink":"https://tfjmp.org/post/rpe/","publishdate":"2022-05-15T00:00:00Z","relpermalink":"/post/rpe/","section":"post","summary":"On this page, I discuss the RPE process in the department of computer science at UBC.","tags":["Academic"],"title":"The Research Proficiency Evaluation process","type":"post"},{"authors":["Thomas Pasquier"],"categories":["Misc"],"content":" ","date":1649030400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1682361702,"objectID":"c7f4c0f9e7ebce4f5b32b92495d4be11","permalink":"https://tfjmp.org/post/calendar/","publishdate":"2022-04-04T00:00:00Z","relpermalink":"/post/calendar/","section":"post","summary":"This is my calendar availability.","tags":["Academic"],"title":"Calendar","type":"post"},{"authors":["Thomas Pasquier"],"categories":["Advising"],"content":"body {text-align: justify} The goal of this document is to set mutual expectations on our collaboration. We collaboratively update this document once a year.\nAdvisory goals My goal in advising graduate students is to help train them to become independent researchers. This encompasses both the general scientific and research process, from identifying a compelling research question to communicating results through papers and presentations, as well as discipline-specific skills.\nWhile these skills are particularly useful in academia, they are also useful in other contexts as well. Some students will pursue careers as research-track academics, while others may become teaching-oriented professors, industry researchers, or entrepeneurs. I am happy to work with students with any of these goals; students should discuss these types of goals with me on occasion, particularly as their thinking changes about future plans or leading up to relevant milestones in their degree.\nWorking environment and relationship Meetings and other regular communication I structure my regular meetings with students on a per-project basis, which may mean that a given group of students typically meet me together. We typically meet weekly, during which each student shares updates on their recent progress. We have also have lab-wide meeting once a week. I expect to meet, even if you don’t expect to have substantial topics to talk about.\nBefore the meeting send me anything you’d like to discuss (e.g. a paper draft) the day before the meeting. Take notes during the meeting, and keep them in a location we can both have access to. Not having substantial updates for a regular meeting should be rare. Doing it more than once in a while is an indication that you’re regularly having unproductive stretches, and we should discuss why this is and what to do about it.\nOutside of these regularly arranged times, I am often available for impromptu discussion as it would be helpful. Feel free to ask for my time whenever it would be helpful, and allow me to say no if I can’t. To arrange additional discussions, get in touch with me via Slack to schedule a time to meet.\nCommunication from me to you outside of meetings: I may message you via Slack at any time, but I do not expect you to reply outside of your typical working hours except if otherwise agreed to for a particular reason, e.g., an imminent conference deadline. I do expect a response within 1 day (not including weekends), even if it is only to acknowledge the note and say that a more complete response is forthcoming. And likewise, from you to me: Feel free to message me at any time; I likewise may not respond until my typical working hours. I prefer that you contact me via Slack except in exceptional situations.\nLab meetings and events In addition to regular meetings with me, you are also expected to generally attend and contribute to Systopia’s reading group. This is valuable both to you and to others. While you will have much of your own work to do, being a member of the lab is about more than just writing your own papers. Attending talks by your labmates, giving feedback, discussing other research areas, ideas, and process, and even just talking about academia over a coffee all contributes indirectly to your training, and the training of others in the lab. I expect you to participate in such events regularly.\nWorking hours During periods for which you are doing research with me, you should treat your degree as a full-time job. Some time will be spent on taking courses, or serving as a TA as part of your funding; the remainder of your time should be spent on research. It doesn’t matter to me when you work, outside of our mutually arranged meetings and other responsibilities, as long as you are making good progress. To maximize overlap with your labmates, I’d like you to be generally available Monday-Friday (except on holidays) in the core hours between 10am-3pm, but whether you typically start earlier, or work later I leave up to your personal preference.\nPhysical space: It is valuable for you to interact with others in the group and the department; these are benefits not just to you but also to your labmates. I therefore expect you to typically work from the lab space at least two days a week. I encourage that you try to be in the lab at least between 10am - 3pm, when most of the lab members will be around. The rest of time, I don’t mind if you prefer to work from home or elsewhere, but try to respond promptly by Slack during your typical working hours. If you will be working from home or otherwise away from the lab for more than a week then you should let me know.\nIf you need to work remotely for personal or medical reason, please discuss with me. You should not do so during your first year.\nVacation It’s important to take a break from time to time. The UBC policy is that graduate students have three weeks of vacation time (15 days), in addition to the week the university is closed between …","date":1649030400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1756404069,"objectID":"b9dd084d4b8e7466327dce9a3427daf0","permalink":"https://tfjmp.org/post/working-with-me/","publishdate":"2022-04-04T00:00:00Z","relpermalink":"/post/working-with-me/","section":"post","summary":"On this page, I describe how I run my lab and what you should expect when working with us!","tags":["Academic"],"title":"Supervisory expectations","type":"post"},{"authors":["A Trisovic","M K Lau","T Pasquier","M Crosas"],"categories":null,"content":"","date":1645401600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1649967013,"objectID":"4845920d6c242244a5395239b97491e6","permalink":"https://tfjmp.org/publication/2022-scidata/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2022-scidata/","section":"publication","summary":"This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals’ collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.","tags":null,"title":"A large-scale study on research code quality and execution","type":"publication"},{"authors":["Thomas Pasquier"],"categories":["Applications"],"content":"Candidate presentation (15min) We will start the interview with a brief presentation to give you an opportunity to give voice to your application material. Use this presentation to explain who you are, what you have done, and what you want to do.\n Introduce yourself (~1min) Discuss a research project you worked on, clearly focusing on your contribution (~10min) Discuss why you want to go to UBC/work with X supervisor/advisor (~4min)  We are aware that there are systemic barriers that prevent students from realizing their full potential. For example, you may not have been able to engage in as many research projects you would have liked to for a number of different reasons such as the fact that your home institution is not a research intensive university, you needed to work to pay your tuition, you had family responsibilities etc. We understand this, and we want to consider each student’s potential and help them realize that potential. You should feel free to point this out during your presentation if you feel comfortable doing so and as you judge it to be appropriate.\nQuestions to the applicant (10min) Following your presentation, we will undoubtedly want to ask you questions about your work as well as questions to assess how well your research interests fit into our lab.\nPaper reading (10min) Please pick one paper from the following list:\n Negi, Parimarjan, et al. “Steering Query Optimizers: A Practical Take on Big Data Workloads.” Proceedings of the 2021 International Conference on Management of Data (SIGMOD). 2021 Paccagnella, Riccardo, et al. “Custos: Practical tamper-evident auditing of operating systems using trusted execution.” Network and Distributed System Security Symposium (NDSS). 2020 Alsaheel, Abdulellah, et al. “ATLAS: A Sequence-based Learning Approach for Attack Investigation.” 30th Security Symposium (USENIX Security). 2021 Bahmani, Raad, et al. “CURE: A Security Architecture with CUstomizable and Resilient Enclaves.” 30th Security Symposium (USENIX Security). 2021  You should pick a paper that appears to be the most relevant to the research you are thinking of pursuing. Reading academic papers is difficult; learning to do so will be an important skill you will acquire in graduate school. This part of the interview is intended to gauge your ability to engage with the literature and how you think about the research literature. Do not be intimidated, you will be assessed based on reasonable expectations for someone at your stage.\nYou may prepare some slides to discuss the following:\n Summarize key points/insights of the paper/what surprised you? [no need to explain the whole thing] (~5min) How would you extend/build on the paper? (~5min)  Discussion about research (10min) Building on your presentation, we will discuss research directions you could explore during your degree. The goal is for all parties to gauge the fit between our interests and discuss the general area of research you will pursue during your time at UBC.\nQuestions from the applicant (15min) This is the time for you to ask questions about the lab, UBC, Vancouver or anything you would like to know. I am also more than happy to put you in touch with my students if you want to hear first hand about students’ experience.\nAfter the interview The time between the interview and a decision may vary. There are multiple factors at play: progress on interviewing other candidates on the shortlist, how busy the committee is, administrative aspects etc. We try to give offers as early as possible, but the department continues to make offers all the way into April. Consequently, do not worry if you don’t hear back immediately. On the other hand, if you get another offer and you need to make a decision, do not hesitate to get in touch.\nOnce you get an offer, it is important for us to make sure you feel welcomed to the lab and prepare the support you will need. It is the right time to bring up issues you felt were not appropriate to discuss during the interview, but that you are legitimately concerned about. This might include questions around finance, accommodations (e.g., childcare, accessibility, etc.), immigration and many more. I may not be able to help with all those aspects, but I can ask or point you towards appropriate resources. We want to support our students not only as future researches, but also as individuals, so please, do not hesitate to bring up any concerns you have. This is also a time for you to ask questions to help you decide which offer to accept (and we hope it will be UBC)! We recruit bright students, most students will have multiple offers so you should feel free to discuss this openly.\nAcknowledgement Thanks to Margo and Aastha for their feedback!\n","date":1639094400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1645491633,"objectID":"e897196e937a0790bb122f6f98338d94","permalink":"https://tfjmp.org/post/2022-interview/","publishdate":"2021-12-10T00:00:00Z","relpermalink":"/post/2022-interview/","section":"post","summary":"Congratulations! You have been shortlisted and you will soon be interviewing.","tags":["Academic"],"title":"Graduate Admission Interview (Season 2022)","type":"post"},{"authors":[],"categories":null,"content":"","date":1638363600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1637563271,"objectID":"883526c5c03519491a85103d919be035","permalink":"https://tfjmp.org/talk/tracking-and-analyzing-provenance/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/talk/tracking-and-analyzing-provenance/","section":"event","summary":"Provenance is the representation of a system execution as a directed acyclic graph. Those graphs, representing the execution of an entire system from initialization to shut down, can be comprised of millions of graph elements. In this talk, we will discuss how we can capture provenance efficiently while providing guarantees about its completeness and accuracy. We will also look at how provenance can be used in practice, for example, to perform intrusion detection.","tags":[],"title":"Tracking and Analyzing Provenance","type":"event"},{"authors":["SY Lim","B Stelea","X Han","T Pasquier"],"categories":null,"content":"  ","date":1635724800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1736805468,"objectID":"3c9ae855c79d8cbbc1c337201147cf33","permalink":"https://tfjmp.org/publication/2021-socc/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2021-socc/","section":"publication","summary":"Despite the wide usage of container-based cloud computing, container auditing for security analysis relies mostly on built-in host audit systems, which often lack the ability to capture high-fidelity container logs. State-of-the-art reference-monitor-based audit techniques greatly improve the quality of audit logs, but their system-wide architecture is too costly to be adapted for individual containers. Moreover, these techniques typically require extensive kernel modifications, making it difficult to deploy in practical settings. In this paper, we present saBPF (**s**ecure **a**udit **BPF**), an extension of the eBPF framework capable of deploying secure system-level audit mechanisms at the container granularity. We demonstrate the practicality of saBPF in Kubernetes by designing an audit framework, an intrusion detection system, and a lightweight access control mechanism. We evaluate saBPF and show that it is comparable in performance and security guarantees to audit systems from the literature that are implemented directly in the kernel.","tags":null,"title":"Secure Namespaced Kernel Audit for Containers","type":"publication"},{"authors":["X Han","X Yu","T Pasquier","D Li","J Rhee","J Mickens","M Seltzer","C Haifeng"],"categories":null,"content":"  ","date":1628640000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1748543570,"objectID":"4d8ba397de00266c5ab0d0d39e2de526","permalink":"https://tfjmp.org/publication/2021-usenixsec/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2021-usenixsec/","section":"publication","summary":"Many users implicitly assume that software can only be exploited after it is installed. However, recent supply-chain attacks demonstrate that application integrity must be ensured during installation itself. We introduce SIGL, a new tool for detecting malicious behavior during software installation. SIGL collects traces of system call activity, building a data provenance graph that it analyzes using a novel autoencoder architecture with a graph long short-term memory network (graph LSTM) for the encoder and a standard multilayer perceptron for the decoder. SIGL flags suspicious installations as well as the specific installation-time processes that are likely to be malicious. Using a test corpus of 625 malicious installers containing real-world malware, we demonstrate that SIGL has a detection accuracy of 96%, outperforming similar systems from industry and academia by up to 87% in precision and recall and 45% in accuracy. We also demonstrate that SIGL can pinpoint the processes most likely to have triggered malicious behavior, works on different audit platforms and operating systems, and is robust to training data contamination and adversarial attack. It can be used with application-specific models, even in the presence of new software versions, as well as application-agnostic meta-models that encompass a wide range of applications and installers.","tags":null,"title":"SIGL: Securing Software Installations Through Deep Graph Learning","type":"publication"},{"authors":[],"categories":null,"content":"","date":1611835200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1614252487,"objectID":"bd887f9227f2f95cd286deb9577cef65","permalink":"https://tfjmp.org/talk/efficient-large-scale-data-provenance-tracking-and-analyzing-intrusion-detection/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/talk/efficient-large-scale-data-provenance-tracking-and-analyzing-intrusion-detection/","section":"event","summary":"Provenance is the representation of a system execution as a directed acyclic graph. Those graphs, representing the execution of an entire system from initialization to shut down, can be comprised of millions of graph elements. After  a general introduction to the field of data provenance, it will present my recent work on the development of a provenance-based intrusion detection system. The system spans the entire software stack  from the kernel-level capture mechanism to the algorithm used to perform intrusion detection. This talk is based on papers published at ACM CCS, NDSS and USENIX Security. I will be available after the talk for further technical discussions.","tags":[],"title":"Efficient Large-Scale Data Provenance Tracking and Analyzing: Intrusion Detection","type":"event"},{"authors":null,"categories":null,"content":"","date":1607410800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1607410800,"objectID":"1dc5509f30c42bbfa24bb7b702f9b095","permalink":"https://tfjmp.org/talk/building-a-provenance-based-intrusion-detection-system/","publishdate":"2020-01-01T00:00:00Z","relpermalink":"/talk/building-a-provenance-based-intrusion-detection-system/","section":"event","summary":"Provenance is the representation of a system execution as a directed acyclic graph. Those graphs, representing the execution of an entire system from initialization to shut down, can be comprised of millions of graph elements. In this talk, I will present my work on the development of a provenance-based intrusion detection system. I will discuss the development of the stack from the kernel-level capture mechanism to the algorithm used to perform intrusion detection. Finally, I will discuss planned future work and areas of potential collaborations. This talk is based on papers published at ACM CCS, NDSS and Usenix Security.","tags":null,"title":"Building a provenance-based intrusion detection system","type":"event"},{"authors":null,"categories":null,"content":"","date":1606388400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1606388400,"objectID":"04737812b68c1686e6c8eb08672bd84f","permalink":"https://tfjmp.org/talk/building-a-provenance-based-intrusion-detection-system/","publishdate":"2020-01-01T00:00:00Z","relpermalink":"/talk/building-a-provenance-based-intrusion-detection-system/","section":"event","summary":"Provenance is the representation of a system execution as a directed acyclic graph. Those graphs, representing the execution of an entire system from initialization to shut down, can be comprised of millions of graph elements. In this talk, I will present my work on the development of a provenance-based intrusion detection system. I will discuss the development of the stack from the kernel-level capture mechanism to the algorithm used to perform intrusion detection. Finally, I will discuss planned future work and areas of potential collaborations. This talk is based on papers published at ACM CCS, NDSS and Usenix Security.","tags":null,"title":"Building a provenance-based intrusion detection system","type":"event"},{"authors":null,"categories":null,"content":"","date":1605175200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1637562952,"objectID":"f5a9c1f8e2b279713bc9125dab5ceea2","permalink":"https://tfjmp.org/talk/provenance-based-intrusion-detection/","publishdate":"2020-01-01T00:00:00Z","relpermalink":"/talk/provenance-based-intrusion-detection/","section":"event","summary":"Provenance is the representation of a system execution as a directed acyclic graph. Those graphs, representing the execution of an entire system from initialization to shut down, can be comprised of millions of graph elements. In this talk, I will give an overview of my work on the development of a provenance-based intrusion detection system. I will discuss the development of the stack from the kernel-level capture mechanism to the algorithm used to perform intrusion detection. This talk is based on papers published at ACM CCS, NDSS and Usenix Security.","tags":null,"title":"Provenance-based intrusion detection","type":"event"},{"authors":null,"categories":null,"content":"","date":1604607300,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1604607300,"objectID":"55763f9547975c023dc938814e5fdf8b","permalink":"https://tfjmp.org/talk/to-tune-or-not-to-tune/","publishdate":"2020-01-01T00:00:00Z","relpermalink":"/talk/to-tune-or-not-to-tune/","section":"event","summary":"Thomas Pasquier is currently an Assistant Professor at the University of Bristol (UK). Thomas has been working on how to make systems more transparent and how to use the insights gained. During today session, Thomas will discuss his work with colleagues at Cambridge on auto-tuning on the spark platform. We proposed an (open-source) extension to Spark which learns to automatically select good configuration as more workload are executed. This work was recently discussed in a [KDD article](https://tfjmp.org/files/publications/2020-kdd.pdf).","tags":null,"title":"To Tune or not To Tune","type":"event"},{"authors":["A Fekry","L Carata","T Pasquier","A Rice"],"categories":null,"content":"","date":1603152000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"10b4a256d74cae7f3c7c97c5acb35d1e","permalink":"https://tfjmp.org/publication/2020-bigdata/","publishdate":"2020-10-20T00:00:00Z","relpermalink":"/publication/2020-bigdata/","section":"publication","summary":"One of the key challenges for data analytics deployment is configuration tuning. The existing approaches for configuration tuning are expensive and overlook the dynamic characteristics of the analytics environment (i.e. frequent changes in workload due to receiving evolving input sizes or change in the underlying cluster environment). Such workload/environment changes can cause significant performance degradation, with retuning the configuration to accommodate those changes can yield up to 85% potential execution time saving. We propose SimTune, an approach that accommodates such changes through efficient configuration tuning. SimTune combines workload characterization and Multitask Bayesian optimization to identify similarity across workloads and accelerate finding near-optimal configurations. Our experimental results show that SimTune reduces the search time for finding close to optimal configurations by 56-73% (at the median) when compared to existing state-of-the-art techniques. This means that the amortization of the tuning cost happens significantly faster, enabling practical tuning in the rapidly changing environment of distributed analytics.","tags":null,"title":"Accelerating the Configuration Tuning of Big Data Analytics with Similarity-aware Multitask Bayesian Optimization","type":"publication"},{"authors":["A Fekry","L Carata","T Pasquier","A Rice","A Hopper"],"categories":null,"content":"","date":1588291200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1736805468,"objectID":"06c6ba75cf5961bcc7462a9211b6a9e7","permalink":"https://tfjmp.org/publication/2020-kdd/","publishdate":"2020-05-01T00:00:00Z","relpermalink":"/publication/2020-kdd/","section":"publication","summary":"This experimental study presents a number of issues that pose a challenge for practical configuration tuning and its deployment in data analytics frameworks. These issues include: 1) the assumption of a static workload or environment, ignoring the dynamic characteristics of the analytics environment ( e.g., increase in input data size, changes in allocation of resources). 2) the amortization of tuning costs and how this influences what workloads can be tuned in practice in a cost-effective manner. 3) the need for a comprehensive incremental tuning solution for a diverse set of workloads. We adapt different ML techniques in order to obtain efficient incremental tuning in our problem domain, and propose Tuneful, a configuration tuning framework. We show how it is designed to overcome the above issues and illustrate its applicability by running a wide array of experiments in cloud environments provided by two different service providers.","tags":null,"title":"To Tune or Not to Tune? In Search of Optimal Configurations for Data Analytics","type":"publication"},{"authors":["X Han","J Mickens","A Gehani","M Seltzer","T Pasquier"],"categories":null,"content":"  ","date":1588291200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"e6d4b83bd2c8349bb39c25e46a0978f0","permalink":"https://tfjmp.org/publication/2020-precs/","publishdate":"2020-05-01T00:00:00Z","relpermalink":"/publication/2020-precs/","section":"publication","summary":"Host-based anomaly detectors generate alarms by inspecting audit logs for suspicious behavior. Unfortunately, evaluating these anomaly detectors is hard. There are few high-quality, publiclyavailable audit logs, and there are no pre-existing frameworks that enable push-button creation of realistic system traces. To make trace generation easier, we created Xanthus, an automated tool that orchestrates virtual machines to generate realistic audit logs. Using Xanthus’ simple management interface, administrators select a base VM image, configure a particular tracing framework to use within that VM, and define post-launch scripts that collect and save trace data. Once data collection is finished, Xanthus creates a self-describing archive, which contains the VM, its configuration parameters, and the collected trace data. We demonstrate that Xanthus hides many of the tedious (yet subtle) orchestration tasks that humans often get wrong; Xanthus avoids mistakes that lead to non-replicable experiments","tags":null,"title":"Xanthus: Push-button Orchestration of Host Provenance Data Collection","type":"publication"},{"authors":["X Han","T Pasquier","A Bates","J Mickens","M Seltzer"],"categories":null,"content":"  ","date":1583020800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1736805468,"objectID":"761626ce975ec661e9ac468b242fb563","permalink":"https://tfjmp.org/publication/2020-ndss/","publishdate":"2020-03-01T00:00:00Z","relpermalink":"/publication/2020-ndss/","section":"publication","summary":"Advanced Persistent Threats (APTs) are difficult to detect due to their low-and-slow attack patterns and frequent use of zero-day exploits. We present UNICORN, an anomaly-based APT detector that effectively leverages data provenance analysis. From modeling to detection, UNICORN tailors its design specifically for the unique characteristics of APTs. Through extensive yet time-efficient graph analysis, UNICORN explores provenance graphs that provide rich contextual and historical information to identify stealthy anomalous activities without pre-defined attack signatures. Using a graph sketching technique, it summarizes long-running system execution with space efficiency to combat slow-acting attacks that take place over a long time span. UNICORN further improves its detection capability using a novel modeling approach to understand long-term behavior as the system evolves. Our evaluation shows that UNICORN outperforms an existing state-of-the-art APT detection system and detects real-life APT scenarios with high accuracy.","tags":null,"title":"UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats","type":"publication"},{"authors":["M K Lau","T Pasquier","M Seltzer"],"categories":null,"content":"","date":1581811200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"e6f8c6773b35ca3f76103e41ecbf3098","permalink":"https://tfjmp.org/publication/2020-joss/","publishdate":"2020-02-16T00:00:00Z","relpermalink":"/publication/2020-joss/","section":"publication","summary":"The growth of programming in the sciences has been explosive in the last decade. This has facilitated the rapid advancement of science through the agile development of computational tools. However, concerns have begun to surface about the reproducibility of scientific research in general (Baker, 2016) and the potential issues stemming from issues with analytical software (Stodden, Seiler, \u0026 Ma, 2018). Specifically, there is a growing recognition across disciplines that simply making data and software “available” is not enough and that there is a need to improve the transparency and stability of scientific software (Pasquier et al., 2018).","tags":null,"title":"Rclean: A Tool for Writing Cleaner, More Transparent Code","type":"publication"},{"authors":["D O'Keeffe","A Vranaki","T Pasquier","D Eyers"],"categories":null,"content":"","date":1580860800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"2080245670e90cc3c48a576f1ee2b2be","permalink":"https://tfjmp.org/publication/2020-ic2e/","publishdate":"2020-02-05T00:00:00Z","relpermalink":"/publication/2020-ic2e/","section":"publication","summary":"A cloud provider that can technically determine tenants' operations may be compelled to disclose such activities by law enforcement agencies (LEAs). The situation gets even more complex when multiple LEAs across different jurisdictions are involved, e.g., because of the distributed locations of cloud servers and data storage. Yet cloud providers typically do not need or want to know about their tenants' activities, other than measuring how such activities incur expenses for using cloud resources. Thus mechanisms should be developed for cloud providers to have sufficient plausible deniability with regards to the processing being carried out by tenants on their platform, in jurisdictions that permit cloud providers to avoid liabilities in this way. Symmetrically, such mechanisms could protect tenants from legal over-reach, for example, when the country in which the cloud provider is incorporated could force disclosure of the processing carried out by cloud tenants. But to what extent can cloud providers acquire plausible deniability? Current discussions regarding risk have focused on data confidentiality and integrity. We argue that processing operations can equally reveal sensitive information---such as trade secrets and business processes---and that for some classes of application both data protection and algorithm protection are necessary. In this paper, we examine the legal and technical motivations for achieving plausible deniability in cloud interactions. We demonstrate the likely performance overhead of using containers secured with technologies such as Intel SGX. Further, we examine the current limitations of our proposed plausible deniability mechanisms, and outline a potential approach for enabling lawful access to enclaves subject to appropriate judicial oversight.","tags":null,"title":"Facilitating plausible deniability for cloud providers regarding tenants' activities using trusted execution","type":"publication"},{"authors":null,"categories":null,"content":"","date":1579100400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1579100400,"objectID":"1cdfc4445a5565f8169418eb4f044866","permalink":"https://tfjmp.org/talk/provenance-based-intrusion-detection/","publishdate":"2020-01-01T00:00:00Z","relpermalink":"/talk/provenance-based-intrusion-detection/","section":"event","summary":"Whole-system provenance is the record of flows of information between kernel objects (e.g., files, task, sockets etc.). This information is represented as a directed acyclic graph that can be analysed to extract information about the execution of the system. Building on the DARPA transparent computing programme a number of research groups have explored means to develop provenance-based intrusion detection systems. In this talk, we will discuss how provenance can be captured and analysed to achieve such an objective.","tags":null,"title":"Provenance-based Intrusion Detection","type":"event"},{"authors":["C Mistry","B Stelea","V Kumar","T Pasquier"],"categories":null,"content":"","date":1577836800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1624583111,"objectID":"5fa922cdf7786ccd72e2103817708731","permalink":"https://tfjmp.org/publication/2020-cloudcom/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/publication/2020-cloudcom/","section":"publication","summary":"The rise of IoT has led to large volumes of personal data being produced at the network’s edge. Most IoT applications process data in the cloud raising concerns over privacy and security. As many IoT applications are event-based and are implemented on cloud-based, serverless platforms, we’ve seen a number of proposals to deploy serverless solutions at the edge to address concerns over data transfer. However, conventional serverless platforms use container technology to run user-defined functions. Containers introduce their own issues regarding security – due to a large trusted computing base –, and performance issues including long initialisation times. Additionally, OpenWhisk a popular and widely used containerbased serverless platform available for edge devices perform relatively poorly as we demonstrate in our evaluation. In this paper, we propose to investigate unikernel as a solution to build serverless platform at the edge, addressing in particular performance and security concerns. We present UniFaaS, a prototype edge-serverless platform which leverages unikernels – tiny library single-address-space operating systems that only contain the parts of the OS needed to run a given application – to execute functions. The result is a serverless platform with extremely low memory and CPU footprints, and excellent performance. UniFaaS has been designed to be deployed on low-powered single-board computer devices, such as Raspberry Pi or Arduino, without compromising on performance.","tags":[],"title":"Demonstrating the Practicality of Unikernels to Build a Serverless Platform at the Edge","type":"publication"},{"authors":null,"categories":null,"content":"","date":1573465500,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1573465500,"objectID":"0b411061daa1c0185886599d66fb3898","permalink":"https://tfjmp.org/talk/building-a-provenance-based-ids-and-the-questions-we-ask-ourselves/","publishdate":"2019-10-16T00:00:00Z","relpermalink":"/talk/building-a-provenance-based-ids-and-the-questions-we-ask-ourselves/","section":"event","summary":"Provenance is the representation of a system execution as a directed acyclic graph. Whole-system provenance graph, representing the execution of an entire system from initialization to shut down, can be comprised of millions of graph elements. It is believed that the use of such graphs can help build better intrusion detection systems. We have attempted to build full stack intrusion detection systems from kernel capture up to the data analysis. In the spirit of a constructive workshop, in this talk, I will present those attempts discussing our design decisions and the questions that we need to answer.","tags":null,"title":"Building a provenance-based IDS and the questions we ask ourselves","type":"event"},{"authors":["S C Chan","J Cheney","P Bhatotia","A Gehani","H Irshad","T Pasquier","L Carata","M Seltzer"],"categories":null,"content":"","date":1567641600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"3f413fb224ffa10f810d30030b90e047","permalink":"https://tfjmp.org/publication/2019-middleware/","publishdate":"2019-09-05T00:00:00Z","relpermalink":"/publication/2019-middleware/","section":"publication","summary":"System level provenance is of widespread interest for applications such as security enforcement and information protection. However, testing the correctness or completeness of provenance capture tools is challenging and currently done manually. In some cases there is not even a clear consensus about what behavior is correct. We present an automated tool, ProvMark, that uses an existing provenance system as a black box and reliably identifies the provenance graph structure recorded for a given activity, by a reduction to subgraph isomorphism problems handled by an external solver. ProvMark is a beginning step in the much needed area of testing and comparing the expressiveness of provenance systems. We demonstrate ProvMark’s usefuless in comparing three capture systems with different architectures and distinct design philosophies.","tags":null,"title":"ProvMark: A Provenance Expressiveness Benchmarking System","type":"publication"},{"authors":["T Pasquier","D Eyers","M Seltzer"],"categories":null,"content":"","date":1563753600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"61bb03e749459e82746df27bb95340ff","permalink":"https://tfjmp.org/publication/2019-poly/","publishdate":"2019-07-22T00:00:00Z","relpermalink":"/publication/2019-poly/","section":"publication","summary":"Valuable, sensitive, and regulated data flow freely through distributed governing the collection, use, and management of such data? We claim that distributed data provenance, the directed acyclic graph documenting the origin and transformations of data holds the key. Provenance analysis has already been demonstrated in a wide range of applications: from intrusion detection to performance analysis. We describe how similar systems and analysis techniques are suitable both for implementing the complex policies that govern data and verifying compliance with regulatory mandates. We also highlight the challenges to be addressed to move provenance from research laboratories to production systems.","tags":null,"title":"From Here to Provtopia","type":"publication"},{"authors":null,"categories":null,"content":"","date":1559817000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1559817000,"objectID":"07bd1f458408888ce691a120bd5d54c0","permalink":"https://tfjmp.org/talk/towards-provenance-based-intrusion-detection/","publishdate":"2019-01-01T00:00:00Z","relpermalink":"/talk/towards-provenance-based-intrusion-detection/","section":"event","summary":"Provenance is the representation of a system execution as a directed acyclic graph. Whole-system provenance graph, representing the execution of an entire system from initialization to shut down, can be comprised of millions of graph elements. In this talk, I will present my work on the development of a provenance-based intrusion detection system. I will discuss the development of the stack from the kernel-level capture mechanism to the algorithm used to perform intrusion detection. Finally, I will discuss planned future work and areas of potential collaborations.","tags":null,"title":"Towards provenance-based intrusion detection","type":"event"},{"authors":["A Fekry","L Carata","T Pasquier","Andrew Rice","Andy Hopper"],"categories":null,"content":"","date":1558224000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"e1ecc20fe2f7ceb7bf494f9e768d27c8","permalink":"https://tfjmp.org/publication/2019-icdcs/","publishdate":"2019-05-19T00:00:00Z","relpermalink":"/publication/2019-icdcs/","section":"publication","summary":"The execution of distributed data processing workloads (such as those running on top of Hadoop or Spark) in cloud environments presents a unique opportunity to explore multiple trade-offs between elasticity (and types of resources being allocated), overall runtime and total costs. However, beyond high-level constraints and objectives, it's not the end-users who should be mainly concerned with those optimizations, but the cloud providers. They have both the vantage point to collect actionable information, economies of scale and position to adjust parameters when dynamic conditions change, in order to fulfil SLOs that go beyond classic measures of latency and throughput. This is at odds with the existing approach of making software (including the interfaces to the cloud and the processing frameworks) as configurable as possible. We propose that rather than configurability, self-tunability (or the illusion of it as far as the end-user is concerned) is a better long-term goal.","tags":null,"title":"Towards Seamless Configuration Tuning of Big Data Analytics","type":"publication"},{"authors":["T Pasquier","D Eyers","J Bacon"],"categories":null,"content":"","date":1553731200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"5865f9abaee50c3907cb4a7bbb34807e","permalink":"https://tfjmp.org/publication/2019-acm-com/","publishdate":"2019-03-28T00:00:00Z","relpermalink":"/publication/2019-acm-com/","section":"publication","summary":"The Internet of Things promises a connected environment reacting to and addressing our every need, but based on the assumption that all of our movements and words can be recorded and analysed to achieve this end. Ubiquitous surveillance is also a precondition for most dystopian societies, both real and fictional. How our personal data is processed and consumed in an ever more connected world must imperatively be made transparent, and more effective technical solutions than those currently on offer, to manage personal data must urgently be investigated.","tags":null,"title":"Viewpoint | Personal Data and the Internet of Things: It is time to care about digital provenance","type":"publication"},{"authors":null,"categories":null,"content":"","date":1552298400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1552298400,"objectID":"22766a8517bea86f36316f095712d3e0","permalink":"https://tfjmp.org/talk/towards-provenance-based-intrusion-detection/","publishdate":"2019-01-01T00:00:00Z","relpermalink":"/talk/towards-provenance-based-intrusion-detection/","section":"event","summary":"In this talk, provenance-based intrusion detection will be discussed. We are building a full stack solution to perform host-based intrusion detection using information flow graph to represent a system execution. The talk will cover topics ranging from the kernel instrumentation to capture the relevant data, to the ML techniques used to perform the analysis. Published material and source code relating to this project can be found online at [http://camflow.org](http://camflow.org).","tags":null,"title":"Towards provenance-based intrusion detection","type":"event"},{"authors":null,"categories":null,"content":"","date":1548169200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1548169200,"objectID":"873856a13c0bb97e9d0a4326f959829a","permalink":"https://tfjmp.org/talk/building-a-provenance-based-intrusion-detection-system/","publishdate":"2019-01-01T00:00:00Z","relpermalink":"/talk/building-a-provenance-based-intrusion-detection-system/","section":"event","summary":"Provenance is the representation of a system execution as a directed acyclic graph. Whole-system provenance graph, representing the execution of an entire system from initialization to shut down, can be comprised of millions of graph elements. In this talk, I will present my work on the development of a provenance-based intrusion detection system. I will discuss the development of the stack from the kernel-level capture mechanism to the algorithm used to perform intrusion detection. Finally, I will discuss planned future work and areas of potential collaborations.","tags":null,"title":"Building a provenance-based intrusion detection system","type":"event"},{"authors":null,"categories":null,"content":"","date":1544013000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1544013000,"objectID":"001ecec6d7a71c47bc868fb5f3cbed09","permalink":"https://tfjmp.org/talk/towards-a-provenance-based-intrusion-detection-system/","publishdate":"2018-01-01T00:00:00Z","relpermalink":"/talk/towards-a-provenance-based-intrusion-detection-system/","section":"event","summary":"Provenance is the representation of a system execution as a directed acyclic graph. Whole-system provenance graph, representing the execution of an entire system from initialisation to shut down, can be comprised of millions of graph elements. In this talk, I will present my work on the development of a provenance-based intrusion detection system. I will discuss the development of the stack from the kernel-level capture mechanism to the algorithm used to perform intrusion detection. Finally, I will discuss planned future work and areas of potential collaborations.","tags":null,"title":"Towards a provenance-based intrusion detection system","type":"event"},{"authors":null,"categories":null,"content":"  ","date":1539862200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1539862200,"objectID":"e67646f1c96322124257df7cc295dd51","permalink":"https://tfjmp.org/talk/runtime-analysis-of-whole-system-provenance/","publishdate":"2018-01-01T00:00:00Z","relpermalink":"/talk/runtime-analysis-of-whole-system-provenance/","section":"event","summary":"Identifying the root cause and impact of a system intrusion remains a foundational challenge in computer security. Digital provenance provides a detailed history of the flow of information within a computing system, connecting suspicious events to their root causes. Although existing provenance-based auditing techniques provide value in forensic analysis, they assume that such analysis takes place only retrospectively. Such post-hoc analysis is insufficient for realtime security applications; moreover, even for forensic tasks, prior provenance collection systems exhibited poor performance and scalability, jeopardizing the timeliness of query responses. We present CamQuery, which provides inline, realtime provenance analysis, making it suitable for implementing security applications. CamQuery is a Linux Security Module that offers support for both userspace and in-kernel execution of analysis applications. We demonstrate the applicability of CamQuery to a variety of runtime security applications including data loss prevention, intrusion detection, and regulatory compliance. In evaluation, we demonstrate that CamQuery reduces the latency of realtime query mechanisms, while imposing minimal overheads on system execution. CamQuery thus enables the further deployment of provenance-based technologies to address central challenges in computer security.","tags":null,"title":"Runtime Analysis of Whole-System Provenance","type":"event"},{"authors":["T Pasquier","X Han","T Moyer","A Bates","O Hermant","D Eyers","J Bacon","M Seltzer"],"categories":null,"content":"  ","date":1539561600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1691594807,"objectID":"d727e5daf611e690ebe372bc0192d3f6","permalink":"https://tfjmp.org/publication/2018-ccs/","publishdate":"2018-10-15T00:00:00Z","relpermalink":"/publication/2018-ccs/","section":"publication","summary":"Identifying the root cause and impact of a system intrusion remains a foundational challenge in computer security. Digital provenance provides a detailed history of the flow of information within a computing system, connecting suspicious events to their root causes. Although existing provenance-based auditing techniques provide value in forensic analysis, they assume that such analysis takes place only retrospectively. Such post-hoc analysis is insufficient for realtime security applications; moreover, even for forensic tasks, prior provenance collection systems exhibited poor performance and scalability, jeopardizing the timeliness of query responses. We present CamQuery, which provides inline, realtime provenance analysis, making it suitable for implementing security applications. CamQuery is a Linux Security Module that offers support for both userspace and in-kernel execution of analysis applications. We demonstrate the applicability of CamQuery to a variety of runtime security applications including data loss prevention, intrusion detection, and regulatory compliance. In evaluation, we demonstrate that CamQuery reduces the latency of realtime query mechanisms, while imposing minimal overheads on system execution. CamQuery thus enables the further deployment of provenance-based technologies to address central challenges in computer security.","tags":null,"title":"Runtime Analysis of Whole-System Provenance","type":"publication"},{"authors":null,"categories":null,"content":"","date":1531310400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1531310400,"objectID":"547e5b590120ea4cc7511de895934932","permalink":"https://tfjmp.org/talk/provenance-based-intrusion-detection-opportunities-and-challenges/","publishdate":"2018-01-01T00:00:00Z","relpermalink":"/talk/provenance-based-intrusion-detection-opportunities-and-challenges/","section":"event","summary":"Intrusion detection is an arms race; attackers evade intrusion detection systems by developing new attack vectors to sidestep known defense mechanisms. Provenance provides a detailed, structured history of the interactions of digital objects within a system. It is ideal for intrusion detection, because it offers a holistic, attack-vector-agnostic view of system execution. As such, provenance graph analysis fundamentally strengthens detection robustness. We discuss the opportunities and challenges associated with provenance-based intrusion detection and provide insights based on our experience building such systems.","tags":null,"title":"Provenance-based Intrusion Detection: Opportunities and Challenges","type":"event"},{"authors":["X Han","T Pasquier","M Seltzer"],"categories":null,"content":"","date":1531180800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"9864ba3251919216eac71b4651b804ce","permalink":"https://tfjmp.org/publication/2018-tapp/","publishdate":"2018-07-10T00:00:00Z","relpermalink":"/publication/2018-tapp/","section":"publication","summary":"Intrusion detection is an arms race; attackers evade intrusion detection systems by developing new attack vectors to sidestep known defense mechanisms. Provenance provides a detailed, structured history of the interactions of digital objects within a system. It is ideal for intrusion detection, because it offers a holistic, attack-vector-agnostic view of system execution. As such, provenance graph analysis fundamentally strengthens detection robustness. We discuss the opportunities and challenges associated with provenance-based intrusion detection and provide insights based on our experience building such systems.","tags":null,"title":"Provenance-based Intrusion Detection: Opportunities and Challenges","type":"publication"},{"authors":null,"categories":null,"content":"","date":1528297200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1528297200,"objectID":"bc466be0eb0a8d8efc03c397a573885d","permalink":"https://tfjmp.org/talk/building-a-provenance-capture-mechanism/","publishdate":"2018-01-01T00:00:00Z","relpermalink":"/talk/building-a-provenance-capture-mechanism/","section":"event","summary":"There is a consensus that understanding data provenance, the origin and history of digital artifacts, is important. Whole-system provenance systems are capture mechanisms aimed at recording all information flows in an operating system. Such systems have been the subject of recent attention from the research security community. However, whole-system provenance as yet to make a significant impact outside of academic circles. In this talk, I will present our work on CamFlow an open-source whole-system provenance implementation for Linux, and briefly introduce ongoing work on provenance-based intrusion detection as an application example. I will discuss the technical barriers to practical whole-system provenance we aimed to overcome, and those left to address.","tags":null,"title":"Building a provenance capture mechanism","type":"event"},{"authors":["T Pasquier","M K Lau","X Han","E Fong","B Lerner","E Boose","M Crosas","A Ellison","M Seltzer"],"categories":null,"content":"","date":1525132800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"51e79483808a8e4135a574ba4cdaf9f9","permalink":"https://tfjmp.org/publication/2018-cise/","publishdate":"2018-05-01T00:00:00Z","relpermalink":"/publication/2018-cise/","section":"publication","summary":"Open data and open-source software may be part of the solution to science’s “reproducibility crisis”, but they are insufficient to guarantee reproducibility. Requiring minimal end-user expertise, encapsulator creates a “time capsule” with reproducible code in a self-contained computational environment. encapsulator provides end-users with a fully-featured desktop environment for reproducible research.","tags":null,"title":"Sharing and Preserving Computational Analyses for Posterity with encapsulator","type":"publication"},{"authors":["T Pasquier","J Singh","J Powles","D Eyers","M Seltzer","J Bacon"],"categories":null,"content":"","date":1522540800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"e28f506c10ac1eda1c47a95a8341dcbb","permalink":"https://tfjmp.org/publication/2018-ubi/","publishdate":"2018-04-01T00:00:00Z","relpermalink":"/publication/2018-ubi/","section":"publication","summary":"Managing privacy in the IoT presents a significant challenge. We make the case that information obtained by auditing the flows of data can assist in demonstrating that the systems handling personal data satisfy regulatory and user requirements. Thus, components handling personal data should be audited to demonstrate that their actions comply with all such policies and requirements. A valuable side-effect of this approach is that such an auditing process will highlight areas where technical enforcement has been incompletely or incorrectly specified. There is a clear role for technical assistance in aligning privacy policy enforcement mechanisms with data protection regulations. The first step necessary in producing technology to accomplish this alignment is to gather evidence of data flows. We describe our work producing, representing and querying audit data and discuss outstanding challenges.","tags":null,"title":"Data provenance to audit compliance with privacy policy in the Internet of Things","type":"publication"},{"authors":null,"categories":null,"content":"","date":1515416400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1515416400,"objectID":"6f0c77686a73ae24a69cf5e9939541c9","permalink":"https://tfjmp.org/talk/towards-practical-whole-system-provenance/","publishdate":"2018-01-01T00:00:00Z","relpermalink":"/talk/towards-practical-whole-system-provenance/","section":"event","summary":"There is a consensus that understanding data provenance, the origin and history of digital artifacts, is important. Whole-system provenance systems are capture mechanisms aimed at recording all information flows in an operating system. Such systems have been the subject of recent attention from the research security community. However, whole-system provenance as yet to make a significant impact outside of academic circles. In this talk, I will present our work on CamFlow an open-source whole-system provenance implementation for Linux, and briefly introduce ongoing work on provenance-based intrusion detection as an application example. I will discuss the technical barriers to practical whole-system provenance we aimed to overcome, and those left to address.","tags":null,"title":"Towards practical whole-system provenance","type":"event"},{"authors":null,"categories":null,"content":"","date":1506334800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1506334800,"objectID":"d8884a66cb9c7cb9ce55c1653b6d49dd","permalink":"https://tfjmp.org/talk/practical-whole-system-provenance-capture/","publishdate":"2017-01-01T00:00:00Z","relpermalink":"/talk/practical-whole-system-provenance-capture/","section":"event","summary":"Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system’s behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.","tags":null,"title":"Practical Whole-System Provenance Capture","type":"event"},{"authors":["T Pasquier","X Han","M Goldstein","T Moyer","D Eyers","M Seltzer","J Bacon"],"categories":null,"content":"","date":1506297600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1714570659,"objectID":"07f632907fed6a8aade00e2f45948e44","permalink":"https://tfjmp.org/publication/2017-socc/","publishdate":"2017-09-25T00:00:00Z","relpermalink":"/publication/2017-socc/","section":"publication","summary":"Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system’s behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.","tags":null,"title":"Practical Whole-System Provenance Capture","type":"publication"},{"authors":["T Pasquier","M K Lau","A Trisovic","E Boose","B Couturier","M Crosas","A Ellisson","V Gibson","C Jones","M Seltzer"],"categories":null,"content":"","date":1504569600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"cca857a6720dbddcf9af3a4dd7fe9400","permalink":"https://tfjmp.org/publication/2017-scidata/","publishdate":"2017-09-05T00:00:00Z","relpermalink":"/publication/2017-scidata/","section":"publication","summary":"In the last few decades, data-driven methods have come to dominate many fields of scientific inquiry. Open data and open-source software have enabled the rapid implementation of novel methods to manage and analyze the growing flood of data. However, it has become apparent that many scientific fields exhibit distressingly low rates of reproducibility. Although there are many dimensions to this issue, we believe that there is a lack of formalism used when describing end-to-end published results, from the data source to the analysis to the final published results. Even when authors do their best to make their research and data accessible, this lack of formalism reduces the clarity and efficiency of reporting, which contributes to issues of reproducibility. Data provenance aids both reproducibility through systematic and formal records of the relationships among data sources, processes, datasets, publications and researchers.","tags":null,"title":"If these data could talk","type":"publication"},{"authors":["X Han","T Pasquier","T Ranjan","M Goldstein","M Seltzer"],"categories":null,"content":"","date":1499644800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"8dabdbaa272c52976f80408fac910fa6","permalink":"https://tfjmp.org/publication/2017-hotcloud/","publishdate":"2017-07-10T00:00:00Z","relpermalink":"/publication/2017-hotcloud/","section":"publication","summary":"We present FRAPpuccino (or FRAP), a provenance-based fault detection mechanism for Platform as a Service (PaaS) users, who run many instances of an application on a large cluster of machines. FRAP models, records, and analyzes the behavior of an application and its impact on the system as a directed acyclic provenance graph. It assumes that most instances behave normally and uses their behavior to construct a model of legitimate behavior. Given a model of legitimate behavior, FRAP uses a dynamic sliding window algorithm to compare a new instance’s execution to that of the model. Any instance that does not conform to the model is identified as an anomaly. We present the FRAP prototype and experimental results showing that it can accurately detect application anomalies.","tags":null,"title":"FRAPpuccino: Fault-detection through Runtime Analysis of Provenance","type":"publication"},{"authors":["T Pasquier","D Eyers","J Bacon"],"categories":null,"content":"","date":1491264000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1606283695,"objectID":"1c2031558ed35fb1cf8aa905645c7a63","permalink":"https://tfjmp.org/publication/2017-ic2e/","publishdate":"2017-04-04T00:00:00Z","relpermalink":"/publication/2017-ic2e/","section":"publication","summary":"Unikernels are a rapidly emerging technology in the world of cloud computing. Unikernels build on research on library operating systems to deliver smaller, faster and more secure virtual machines, specifically optimised for a single application service. These features are especially useful in cost or resource constrained environments. However, as with any new technology, early adopters need to master many technical details, and understand many aspects of the mechanisms used to build and deploy unikernels. Both of these factors may slow adoption rates. In this paper, we present our initial experiments into the use of an approach for building unikernels that is accessible to those whose technical expertise is focused on web development. We present PHP2Uni: a tool chain that takes a website built from PHP files—PHP remains the most widely used web language— and builds a resource-efficient unikernel image from them, while requiring little knowledge of the underlying operating system software complexity.","tags":null,"title":"PHP2Uni: Building Unikernels using Scripting Language Transpilation","type":"publication"},{"authors":["J Singh","T Pasquier","J Bacon","J Powles","R Diaconu","D Eyers"],"categories":null,"content":"University of Cambridge Computer Laboratory Publication of the Year Award\n","date":1481500800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"ca859f43cf7aade8d3303befb4daa642","permalink":"https://tfjmp.org/publication/2016-mw/","publishdate":"2016-12-12T00:00:00Z","relpermalink":"/publication/2016-mw/","section":"publication","summary":"Internet of Things (IoT) applications, systems and services are subject to law. We argue that for the IoT to develop lawfully, there must be technical mechanisms that allow the enforcement of specified policy, such that systems align with legal realities. The audit of policy enforcement must assist the apportionment of liability, demonstrate compliance with regulation, and indicate whether policy correctly captures legal responsibilities. As both systems and obligations evolve dynamically, this cycle must be continuously maintained. This poses a huge challenge given the global scale of the IoT vision. The IoT entails dynamically creating new services through managed and flexible data exchange. Data management is complex in this dynamic environment, given the need to both control and share information, often across federated domains of administration. We see middleware playing a key role in managing the IoT. Our vision is for a middleware-enforced, unified policy model that applies end-to-end, throughout the IoT. This is because policy cannot be bound to things, applications, or administrative domains, since functionality is the result of composition, with dynamically formed chains of data flows. We have investigated the use of Information Flow Control (IFC) to manage and audit data flows in cloud computing; a domain where trust can be well-founded, regulations are more mature and associated responsibilities clearer. We feel that IFC has great potential in the broader IoT context. However, the sheer scale and the dynamic, federated nature of the IoT pose a number of significant research challenges.","tags":null,"title":"Big Ideas paper: Policy-driven middleware for a legally-compliant Internet of Things","type":"publication"},{"authors":["T Pasquier","J Bacon","J Singh","D Eyers"],"categories":null,"content":"","date":1465171200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"508e2def036dacb84673b03c2a6dd2c5","permalink":"https://tfjmp.org/publication/2016-sacmat/","publishdate":"2016-06-06T00:00:00Z","relpermalink":"/publication/2016-sacmat/","section":"publication","summary":"The usual approach to security for cloud-hosted applications is strong separation. However, it is often the case that the same data is used by different applications, particularly given the increase in data-driven (‘big data’ and IoT) applications. We argue that access control for the cloud should no longer be application-specific but should be data-centric, associated with the data that can flow between applications. Indeed, the data may originate outside cloud services from diverse sources such as medical monitoring, environmental sensing etc. Information Flow Control (IFC) potentially offers data-centric, system-wide data access control. It has been shown that IFC can be provided at operating system level as part of a PaaS offering, with an acceptable overhead. In this paper we consider how IFC can be integrated with application-specific access control, transparently from application developers, while building from simple IFC primitives, access control policies that align with the data management obligations of cloud providers and tenants.","tags":null,"title":"Data-Centric Access Control for Cloud Computing","type":"publication"},{"authors":["J Singh","T Pasquier","J Bacon","H Ko","D Eyers"],"categories":null,"content":"","date":1464739200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"f1685847ddb1e3b9488ab9601d9f6941","permalink":"https://tfjmp.org/publication/2016-iot/","publishdate":"2016-06-01T00:00:00Z","relpermalink":"/publication/2016-iot/","section":"publication","summary":"To realize the broad vision of pervasive computing, underpinned by the “Internet of Things” (IoT), it is essential to break down application and technology-based silos and support broad connectivity and data sharing; the cloud being a natural enabler. Work in IoT tends toward the subsystem, often focusing on particular technical concerns or application domains, before offloading data to the cloud. As such, there has been little regard given to the security, privacy, and personal safety risks that arise beyond these subsystems; i.e., from the wide-scale, cross-platform openness that cloud services bring to IoT. In this paper, we focus on security considerations for IoT from the perspectives of cloud tenants, end-users, and cloud providers, in the context of wide-scale IoT proliferation, working across the range of IoT technologies (be they things or entire IoT subsystems). Our contribution is to analyze the current state of cloud-supported IoT to make explicit the security considerations that require further work.","tags":null,"title":"Twenty security considerations for cloud-supported Internet of Things","type":"publication"},{"authors":["T Pasquier","D Eyers"],"categories":null,"content":"","date":1459987200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"2bc6f53a75795cd5db8cf47f368921ec","permalink":"https://tfjmp.org/publication/2016-claw/","publishdate":"2016-04-07T00:00:00Z","relpermalink":"/publication/2016-claw/","section":"publication","summary":"The adoption of cloud computing is increasing and its use is becoming widespread in many sectors. As the proportion of services provided using cloud computing increases, legal and regulatory issues are becoming more significant. In this paper we explore how an Information Flow Audit (IFA) mechanism, that provides key data regarding provenance, can be used to verify compliance with regulatory and contractual duty, and survey potential extensions. We explore the use of IFA for such a purpose through a smart electricity metering use case derived from a French Data Protection Agency recommendation.","tags":null,"title":"Information Flow Audit for Transparency and Compliance in the Handling of Personal Data","type":"publication"},{"authors":["T Pasquier","J Singh","J Bacon","D Eyers"],"categories":null,"content":"","date":1459728000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"081d28931df8bf060a527bfc37e0acdf","permalink":"https://tfjmp.org/publication/2016-ic2e/","publishdate":"2016-04-04T00:00:00Z","relpermalink":"/publication/2016-ic2e/","section":"publication","summary":"With the rapid increase in uptake of cloud services, issues of data management are becoming increasingly prominent. There is a clear, outstanding need for the ability for specified policy to control and track data as it flows throughout cloud infrastructure, to ensure that those responsible for data are meeting their obligations. This paper introduces Information Flow Audit, an approach for tracking information flows within cloud infrastructure. This builds upon CamFlow (Cambridge Flow Control Architecture), a prototype implementation of our model for data-centric security in PaaS clouds. CamFlow enforces Information Flow Control policy both intra-machine at the kernel-level, and inter-machine, on message exchange. Here we demonstrate how CamFlow can be extended to provide data-centric audit logs akin to provenance metadata in a format in which analyses can easily be automated through the use of standard graph processing tools. This allows detailed understanding of the overall system. Combining a continuously enforced data-centric security mechanism with meaningful audit empowers tenants and providers to both meet and demonstrate compliance with their data management obligations.","tags":null,"title":"Information Flow Audit for PaaS clouds","type":"publication"},{"authors":["T Pasquier","J Singh","J Bacon"],"categories":null,"content":"","date":1448841600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"0c8502ef23b017378c1e1a384c3db694","permalink":"https://tfjmp.org/publication/2015-cloudcom/","publishdate":"2015-11-30T00:00:00Z","relpermalink":"/publication/2015-cloudcom/","section":"publication","summary":"There is a clear, outstanding need for new security mechanisms that allow data to be managed and controlled within the cloud-enabled Internet of Things. Towards this, we propose an approach based on Information Flow Control (IFC) that allows: (1) the continuous, end-to-end enforcement of data flow policy, and (2) the generation of provenance-like audit logs to demonstrate policy adherence and contractual/regulatory compliance. Further, we discuss the role of Trusted Platform Modules (TPMs) in supporting such a system, by providing hardware roots of trust. TPMs can be leveraged to validate software configurations, including the IFC enforcement mechanism, both in the cloud and externally via remote attestation.","tags":null,"title":"Clouds of Things Need Information Flow Control with Hardware Roots of Trust","type":"publication"},{"authors":["T Pasquier","J Singh","D Eyers","J Bacon"],"categories":null,"content":"","date":1444348800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"059bb0fa17c7b17fd8dd3293c2917869","permalink":"https://tfjmp.org/publication/2015-tcc/","publishdate":"2015-10-09T00:00:00Z","relpermalink":"/publication/2015-tcc/","section":"publication","summary":"A model of cloud services is emerging whereby a few trusted providers manage the underlying hardware and communications whereas many companies build on this infrastructure to offer higher level, cloud-hosted PaaS services and/or SaaS applications. From the start, strong isolation between cloud tenants was seen to be of paramount importance, provided first by virtual machines (VM) and later by containers, which share the operating system (OS) kernel. Increasingly it is the case that applications also require facilities to effect isolation and protection of data managed by those applications. They also require flexible data sharing with other applications, often across the traditional cloud-isolation boundaries; for example, when government, consisting of different departments, provides services to its citizens through a common platform. These concerns relate to the management of data. Traditional access control is application and principal/role specific, applied at policy enforcement points, after which there is no subsequent control over where data flows;a crucial issue once data has left its owner’s control by cloud-hosted applications andwithin cloud-services. Information Flow Control (IFC), in addition, offers system-wide, end-to-end, flow control based on the properties of the data. We discuss the potential of cloud-deployed IFC for enforcing owners’ data flow policy with regard to protection and sharing, aswell as safeguarding against malicious or buggy software. In addition, the audit log associated with IFC provides transparency and offers system-wide visibility over data flows. This helps those responsible to meet their data management obligations, providing evidence of compliance, and aids in the identification ofpolicy errors and misconfigurations. We present our IFC model and describe and evaluate our IFC architecture and implementation (CamFlow). This comprises an OS level implementation of IFC with support for application management, together with an IFC-enabled middleware.","tags":null,"title":"CamFlow: Managed Data-Sharing for Cloud Services","type":"publication"},{"authors":["J Singh","J Powles","T Pasquier","J Bacon"],"categories":null,"content":"","date":1442361600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"4ddf116a9954afc7354363714c8944bc","permalink":"https://tfjmp.org/publication/2015-ccm/","publishdate":"2015-09-16T00:00:00Z","relpermalink":"/publication/2015-ccm/","section":"publication","summary":"As cloud computing becomes an increasingly dominant means of providing computing resources, the legal and regulatory issues associated with data in the cloud become more pronounced. These issues derive primarily from four areas: contract, data protection, law enforcement, and regulatory and common law protections for particularly sensitive domains such as health, finance, fiduciary relations, and intellectual property assets. From a technical perspective, these legal requirements all impose information management obligations on data sharing and transmission within cloud-hosted applications and services. They might restrict how, when, where, and by whom data may flow and be accessed. These issues must be managed not only between applications, but also through the entire, potentially global, cloud supply chain.","tags":null,"title":"Data Flow Management and Compliance in Cloud Computing","type":"publication"},{"authors":["T Pasquier","J Singh","J Bacon","O Hermant"],"categories":null,"content":"","date":1435363200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"ebc966ee5f0a781874beb40304c13678","permalink":"https://tfjmp.org/publication/2015-cloud/","publishdate":"2015-06-27T00:00:00Z","relpermalink":"/publication/2015-cloud/","section":"publication","summary":"Concern about data leakage is holding back more widespread adoption of cloud computing by companies and public institutions alike. To address this, cloud tenants/applications are traditionally isolated in virtual machines or containers. But an emerging requirement is for cross-application sharing of data, for example, when cloud services form part of an IoT architecture. Information Flow Control (IFC) is ideally suited to achieving both isolation and data sharing as required. IFC enhances traditional Access Control by providing continuous, data-centric, cross-application, end-to-end control of data flows. However, large-scale data processing is a major requirement of cloud computing and is infeasible under standard IFC. We present a novel, enhanced IFC model that subsumes standard models. Our IFC model supports Big Data processing, while retaining the simplicity of standard IFC and enabling more concise, accurate and maintainable expression of policy.","tags":null,"title":"Managing Big Data with Information Flow Control","type":"publication"},{"authors":["J Singh","T Pasquier","J Bacon"],"categories":null,"content":"","date":1428364800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"a47622cedad99c177dcfc6fa92f3185b","permalink":"https://tfjmp.org/publication/2015-riot/","publishdate":"2015-04-07T00:00:00Z","relpermalink":"/publication/2015-riot/","section":"publication","summary":"To realise the full potential of the Internet of Things (loT), loT architectures are moving towards open and dynamic interoperability, as opposed to closed application silos. This is because functionality is realised through the interactions, i.e. the exchange of data, between a wide-range of 'things'. Data sharing requires management. Towards this, we are exploring distributed, decentralised Information Flow Control (IFC) to enable controlled data flows, end-to-end, according to policy. In this paper we make the case for IFC, as a data-centric control mechanism, for securing loT architectures. Previous research on IFC focuses on a particular system or application, e.g. within an operating system, with little concern for wide-scale, dynamic systems. To render IFC applicable to loT, we present a certificate-based model for secure, trustworthy policy specification, that also reflects real-world loT concerns such as 'thing' ownership. This approach enables decentralised, distributed, verifiable policy specification, crucial for securing the wide-ranging, dynamic interactions of future loT applications.","tags":null,"title":"Securing Tags to Control Information Flows within the Internet of Things","type":"publication"},{"authors":["T Pasquier","J Singh","J Bacon","D Eyers"],"categories":null,"content":"","date":1425859200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"11c512c5c5d535acdb750173ed45103c","permalink":"https://tfjmp.org/publication/2015-ic2e/","publishdate":"2015-03-09T00:00:00Z","relpermalink":"/publication/2015-ic2e/","section":"publication","summary":"Security is an ongoing challenge in cloud computing. Currently, cloud consumers have few mechanisms for managing their data within the cloud provider's infrastructure. Information Flow Control (IFC) involves attaching labels to data, to govern its flow throughout a system. We have worked on kernel-level IFC enforcement to protect data flows within a virtual machine (VM). This paper makes the case for, and demonstrates the feasibility of an IFC-enabled messaging middleware, to enforce IFC within and across applications, containers, VMs, and hosts. We detail how such middleware can integrate with local (kernel) enforcement mechanisms, and highlight the benefits of separating data management policy from application/service-logic.","tags":null,"title":"Integrating Middleware with Information Flow Control","type":"publication"},{"authors":["T Pasquier","J Powles"],"categories":null,"content":"","date":1425772800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"f4a369be7a8043db3b0768a8858f442d","permalink":"https://tfjmp.org/publication/2015-claw/","publishdate":"2015-03-08T00:00:00Z","relpermalink":"/publication/2015-claw/","section":"publication","summary":"The adoption of cloud computing is increasing and its use is becoming widespread in many sectors. As cloud service provision increases, legal and regulatory issues become more significant. In particular, the international nature of cloud provision raises concerns over the location of data and the laws to which they are subject. In this paper we investigate Information Flow Control (IFC) as a possible technical solution to expressing, enforcing and demonstrating compliance of cloud computing systems with policy requirements inspired by data protection and other laws. We focus on geographic location of data, since this is the paradigmatic concern of legal/regulatory requirements on cloud computing and, to date, has not been met with robust technical solutions and verifiable data flow audit trails.","tags":null,"title":"Expressing and Enforcing Location Requirements in the Cloud using Information Flow Control","type":"publication"},{"authors":["T Pasquier","J Singh","J Bacon"],"categories":null,"content":"","date":1425772800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"fe4440a6a1ab0c3f222fb27dc67bd38f","permalink":"https://tfjmp.org/publication/2015-fpaas/","publishdate":"2015-03-08T00:00:00Z","relpermalink":"/publication/2015-fpaas/","section":"publication","summary":"The need to share data across applications is becoming increasingly evident. Current cloud isolation mechanisms focus solely on protection, such as containers that isolate at the OS-level, and virtual machines that isolate through the hypervisor. However, by focusing rigidly on protection, these approaches do not provide for controlled sharing. This paper presents how Information Flow Control (IFC) offers a flexible alternative. As a data-centric mechanism it enables strong isolation when required, while providing continuous, fine grained control of the data being shared. An IFC-enabled cloud platform would ensure that policies are enforced as data flows across all applications, without requiring any special sharing mechanisms.","tags":null,"title":"Information Flow Control for Strong Protection with Flexible Sharing in PaaS","type":"publication"},{"authors":["T Pasquier","J Bacon","D Eyers"],"categories":null,"content":"","date":1418601600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"66e3b11fb87110b8157e1be890dda81a","permalink":"https://tfjmp.org/publication/2014-cloudcom/","publishdate":"2014-12-15T00:00:00Z","relpermalink":"/publication/2014-cloudcom/","section":"publication","summary":"Security concerns are widely seen as an obstacle to the adoption of cloud computing solutions and although a wealth of law and regulation has emerged, the technical basis for enforcing and demonstrating compliance lags behind. Our Cloud Safety Net project aims to show that Information Flow Control (IFC) can augment existing security mechanisms and provide continuous enforcement of extended. Finer-grained application-level security policy in the cloud. We present FlowK, a loadable kernel module for Linux, as part of a proof of concept that IFC can be provided for cloud computing. Following the principle of policy-mechanism separation, IFC policy is assumed to be expressed at application level and FlowK provides mechanisms to enforce IFC policy at runtime. FlowK's design minimises the changes required to existing software when IFC is provided. To show how FlowK can be integrated with cloud software we have designed and evaluated a framework for deploying IFC-aware web applications, suitable for use in a PaaS cloud.","tags":null,"title":"FlowK: Information Flow Control for the Cloud","type":"publication"},{"authors":["J Singh","J Bacon","J Crowcroft","A Madhavapeddy","T Pasquier","W Kuan Hon","C Millard"],"categories":null,"content":"","date":1414800000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"6dcb4e215c7042c63e319ac529d54a29","permalink":"https://tfjmp.org/publication/2014-tr/","publishdate":"2014-11-01T00:00:00Z","relpermalink":"/publication/2014-tr/","section":"publication","summary":"The emergence and rapid uptake of cloud computing services raise a number of legal challenges. Recently, there have been calls for regional clouds; where policy makers from various states have proposed cloud computing services that are restricted to serving (only) their particular geographic region. At a technical level, such rhetoric is rooted in the means for control. This paper explores the technical considerations underpinning a regional cloud, including the current state of cloud provisioning, what can be achieved using existing technologies, and the potential of ongoing research. Our discussion covers technology at various system levels, including network-centric controls, cloud platform management, and governance mechanisms (including encryption and information flow control) for cloud providers, applications, tenants, and end-users.","tags":null,"title":"Regional clouds: technical considerations","type":"publication"},{"authors":["T Pasquier","J Bacon","B Shand"],"categories":null,"content":"","date":1398124800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"d9a532769b9583bab8938c90478d5bf2","permalink":"https://tfjmp.org/publication/2014-modularity/","publishdate":"2014-04-22T00:00:00Z","relpermalink":"/publication/2014-modularity/","section":"publication","summary":"This paper reports on our experience with providing Information Flow Control (IFC) as a library. Our aim was to support the use of an unmodified Platform as a Service (PaaS) cloud infrastructure by IFC-aware web applications. We discuss how Aspect Oriented Programming (AOP) overcomes the limitations of RubyTrack, our first approach. Although use of AOP has been mentioned as a possibility in past IFC literature we believe this paper to be the first illustration of how such an implementation can be attempted. We discuss how we built FlowR (Information Flow Control for Ruby), a library extending Ruby to provide IFC primitives using AOP via the Aquarium open source library. Previous attempts at providing IFC as a language extension required either modification of an interpreter or significant code rewriting. FlowR provides a strong separation between functional implementation and security constraints which supports easier development and maintenance; we illustrate with practical examples. In addition, we provide new primitives to describe IFC constraints on objects, classes and methods that, to our knowledge, are not present in related work and take full advantage of an object oriented language (OO language). The experience reported here makes us confident that the techniques we use for Ruby can be applied to provide IFC for any Object Oriented Program (OOP) whose implementation language has an AOP library.","tags":null,"title":"FlowR: Aspect Oriented Programming for Information Flow Control in Ruby","type":"publication"},{"authors":["J Bacon","D Eyers","T Pasquier","J Singh","I Papagiannis","P Pietzuch"],"categories":null,"content":"","date":1388966400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1641365173,"objectID":"898b97584ea4425634caf79130a38372","permalink":"https://tfjmp.org/publication/2014-tnsm/","publishdate":"2014-01-06T00:00:00Z","relpermalink":"/publication/2014-tnsm/","section":"publication","summary":"Security concerns are widely seen as an obstacle to the adoption of cloud computing solutions. Information Flow Control (IFC) is a well understood Mandatory Access Control methodology. The earliest IFC models targeted security in a centralised environment, but decentralised forms of IFC have been designed and implemented, often within academic research projects. As a result, there is potential for decentralised IFC to achieve better cloud security than is available today. In this paper we describe the properties of cloud computing-Platform-as-a-Service clouds in particular-and review a range of IFC models and implementations to identify opportunities for using IFC within a cloud computing context. Since IFC security is linked to the data that it protects, both tenants and providers of cloud services can agree on security policy, in a manner that does not require them to understand and rely on the particulars of the cloud software stack in order to effect enforcement.","tags":null,"title":"Information Flow Control for Secure Cloud Computing","type":"publication"}]