Publications

Below is a list of all the publications I have written (in full or in part).

On the nature of AI code copilots

Abstract

The recent release of the GitHub Copilot an ‘AI pair programmer’ trained on the billions of lines of publicly viewable code has brought to the forefront discussion on the very nature of Machine Learning and Artificial Intelligence systems. This paper specifically addresses the following issues: Whether they constitute a compiled form of the training data or if they are more akin to a computer program’s source code, and whether they are in violation of copyright.

Available here pdf and html.

Measuring content preservation in textual style transfer

Abstract

Style transfer in text, changing text that is written in a particular style such as the works of Shakespeare to be written in another style, currently relies on taking the cosine similarity of the sentence embeddings of the original and transferred sentence to determine if the content of the sentence, its meaning, hasn’t changed. This assumes however that such sentence embeddings are style invariant, which can result in inaccurate measurements of content preservation. To investigate this we compared the average similarity of multiple styles of text from the Corpus of Diverse Styles using a variety of sentence embedding methods and find that those embeddings which are created from aggregated word embeddings are style invariant, but those created by sentence embeddings are not.

Available here (paywalled) pdf.