mebaya'kushá: Natural Language Liberation Front nLLF

sábado, 14 de mayo de 2011

Natural Language Liberation Front nLLF

zero-count stitching

Zero-count stitching refers to a compositional process within the practice of digitally mediated literary art. It is applied to an existing text in order to analyze and explore poetic potentials within the text and so generate new texts that realize these potentials. Thanks to the emergence of new cultural vectors in the world of language (chiefly, that is, in the 'global English' of the Internet)—such as the Google indexed corpora—zero-count stitching ia able to mine close-to-live natural language data and then feed this information into its expressive processes. As such, zero-count stitching is part of a practice I have chosen to call 'Writing to be Found,' named for the search engines' ability to tell us whether or how frequently certain personally or algorithically composed sequences of written symbols may be found elsewhere within a now-vast corpus of inscription. On the other hand, to exploit and appropriate, aesthetically, the supply text from which we set out, a prior process of algorithmic, generative, and even performativereading must take place. Such reading may become, itself, a practice of literary art in which, reflexively, Writing to be Found may play some part. We pursue this regenerative reading on the distinct sites of The Readers Project.

The poem and translation that has been displayed in the initial contents panel of the present site (and which should be reliably available here) is, in a sense, a minimal example of a text produced by zero-count stitching. For a more extended example, please refer to the abridged performance text,imageZC0403, and its processProse (both also accessible from the sidebar menu under 'work in progress').

To-date, the pieces that I have made with zero-count stitching are composed of three-word phrases, serving also as lines, all of which represent perigrams extracted from the supply text. A text's order-three perigrams are a subset of all the possible three-word combinations in that text. The subset is constrained in accordance with properties typical of (western) texts' typographic characteristics, or their typographic dimension, as we now say.[1] In simple terms, we consider, as perigrams, only those combinations of words from a text that could, potentially, be found in typographic proximity. (Follow the links for more details on The Readers Project site.)

The supply text of 'first wind autumn' is a twenty-word word-for-character translation of a classical Chinese quartain.

first geese suddenly in pairs
startlingly autumn wind water window
long nights rouse us alone
stars moon filling empty river

As it happens, for current pre-processing, we use a twenty-word window around each word of a text in order to compile the combinations for its perigrams. This means that for a short text such as this translation, all of its three-word combinations are considered order-three perigrams. Only for longer texts are perigrams a significantly smaller subset.

In any case, to continue with our process, the text is actively 'read' to extract its perigrams and then these are further refined as follows. We disregard combinations where the same word is repeated in immediate proximity (discarding, for example: aab, or baa; allow: aba). At this point we begin to search for all of the sequences we've extracted and derived in the indexes of either Google 'Everything' or Google 'Books.' We reserve only those perigrams that do not yet occur in this corpus. They are not found. They have not been written (yet).

Our cull will now consist of three word phrases, composed from the words of our supply text that have not yet been composed by any writers of works indexed by Google. Even for a short text, there are thousands of these. We may further reduce their number and improve their correlation with natural English by first removing any sequences with bad agreement between an indefinite article, “a” or “an”, and a following word with or without an initial vowel. (This is clearly unnecessary in the case of our imagist translation.) We might also (and have in some cases) further improve the correlation of these perigrams with the natural English of their supply text by creating a simple parts-of-speech analysis for all the actually occurring three-word sequences in the supply text and preserving only those remaining zero-count perigrams whose analysis matches one or other of the parts-of-speech sequences that our analysis has supplied. At this point we have the set zero-count (three-word) perigrams from which our new texts will be generated.

We select the lines of our new poetic text by choosing, in succession, perigrams that include each of the written words of the supply text. Initially, we choose a three-word perigram at random that includes the first word of this text. But then, as we go on to the next word, rather than simply taking another random perigram including this next word, we attempt instead to stitch together our as yet uninscribed, uncomposed sequences by using Google searches (Books or Everything) once again to find existing,previously composed, three-word phrases that straddle and link our proposed enjambement of successive zero-count perigrams. If the last two words of a leading sequence and the first word of a following sequence, or the last word of a leading sequence and the first two words of one following—if either of these potential three-word, verse-straddling phrases is found to be one that has been indexed and counted by Google, then good: we accept the proposed sequent perigram and continue, repeating this subprocess until we reach the end of the supply text.

That is how the text of 'first wind autumn' was composed—only one of many possible versions that might have been generated by the process—and it is also how the performance texts cited above were made.

[1] The 'typographic dimension' is discussed on The Readers Project site and also in a forthcoming Siggraph/Leonardo paper, John Cayley and Daniel C. Howe, 'The Readers Project: Procedural Agents and Literary Vectors.'