Last week, I had the very real pleasure of attending a Word Vectors for the Thoughtful Humanist workshop. This workshop was hosted by Northeastern’s Women Writers Project (WWP) and was sponsored by a grant from the NEH’s Office of Digital Humanities . The principal instructors throughout the week were Julia Flanders and Sarah Connell , respectively the WWP Director and Assistant Director.
I have a lot that I could say about the workshop, and I hope to collect some thoughts in some short blog posts over the coming days. The idea is that if I try to write some short thoughts rather than say everything , I might end up saying something . But I’ve got to start somewhere, and where I want to start is trying to explain word vectors to myself. I expect that this exercise will prove useful to retain some of what I learned last week, as well as prepare me to share this methodology with my students,
Put very simply, word vectors are a means to represent linguistic data in multi-dimensional space and calculate their similarity. The algorithm—normally word2vec or GloVe —looks at a word and its neighbors within a window that the researcher sets (e.g. 5 words to either side of the key word). Each token (individual word) is converted into a type (which is to say that “marriage” only appears once in the model) and then placed in vector space. These placements are essentially random as the window moves acros