Content recycling by @nytimes

Recently, I happened to notice a pattern of recycled content on The New York Times’ Twitter feed. I decided to take a look at one particular story to see if I could make sense of the pattern.

The story is a quiz titled “Can You Tell What Makes a Good Tweet?” that is itself a quiz from a study done at Cornell by CS folks (and apparently with cooperation from one author at Google). I took the quiz, and despite my research on Twitter, the algorithm they developed for predicting a retweetable tweet won out (not surprising since it had a lot more practice and doesn’t get bored with making picks like I did).

I was struck by the irony of reporting on an algorithm that can pick the winning tweets done by an agency that is trying to float that report as clickbait, so I compiled a list of the @nytimes tweets promoting this quiz. I did it without accessing the Times or Twitter APIs, but it would be interesting for Comm or Journalism folks to do a study of recycling behaviors on media and other outlets. Unlike the Cornell study (which compared two tweets posting the same link at different times) the tweets I’m comparing are exactly the same (same content, same image, same link). Here’s what I came up with:

Total tweets of this article (and associated image): 7
Days from first to last tweet: 5

First tweet (all subsequent tweets are identical):

Tweet details:

Timestamp* Retweets Favorites
8:26 AM – 2 Jul 2014 256 222
11:31 AM – 2 Jul 2014 204 167
10:13 PM – 2 Jul 2014 256 222
4:02 AM – 3 Jul 2014 182 164
6:42 PM – 3 Jul 2014 205 197
9:59 PM – 4 Jul 2014 278 267
10:43 PM – 5 Jul 2014 221 256

*NOTE: all times in CDT

My extremely unscientific study shows a couple of interesting things:

  • Look how the RT and Favorite numbers dip for the morning tweets on 7/2 and 7/3. Perhaps the 7/2 tweet came too close to the initial tweet. Was the 7/3 tweet meant for early risers / European readers?
  • The numbers seem to attenuate until you get to the evening July 4th/5th tweets; on a holiday weekend, are people done hanging with the relatives and scanning Twitter before bed?

As I went progressed in the quiz, I noticed that time of day was an important predictive factor for my guesses; as far as I can tell, the Cornell authors didn’t consider that at all (only the time between tweets). If no one is awake or your tweet doesn’t come to the top of the heap some other way (through RTs, hashtag coincidence, your presence in a list, etc.) my guess is it might as well be gone. There are so many factors independent of tweet content that can influence propagation (including holiday weekends where folks retreat to their separate bedrooms and mobile devices after a long day of family bonding).

As I operate on the assumption of agency and strategy in SM use by organizations, I’m guessing the times these tweets went out weren’t accidental. I think that the social media folks at the NYT are thinking about how often and when they want to tweet out content, and possibly even adjusting their strategy before they retire that content from their SM streams. Based on the tweet I looked at, however, it’s hard to guess what that strategy is.

The Cornell study raises some interesting questions about how to predict tweets that will maximize content dissemination, but also misses some of the many complicated factors that go into predicting proliferation.

The IRB and emotional manipulation

The talk about the now famous Facebook study on emotional contagion got me thinking about the question of the role of institutional review boards (the IRB) and our responsibilities to participants in a study. I’m going to share a story here about an IRB approved study I participated in some years back as an undergraduate. I’m not trying to get anyone in trouble, and I’m not really bothered by the experience now, but I share it because it illustrates the point that even IRB approved research, when poorly designed, can and does screw up and cause emotional impacts that the researchers cannot fully understand or predict.

The National Institute of Health describes the responsibility of the researcher as minimizing harm and maximizing the benefits of research for the participants; this is a direct result of the Tuskegee syphilis experiment, where participants were not told for decades that they could be treated (at very low cost) for their syphilis infections so that the researchers could observe the long-term impact of the disease.

One of the big arguments I’ve heard is that the participants in the Facebook study were not given the option of informed consent: they didn’t know the risks of the study (the researchers probably didn’t have a total handle on those either) and they couldn’t opt out. I just read an excellent analysis on the FB study by danah boyd on the difference between obtaining approval from an IRB, and actually thinking critically as a researcher about the ethical impacts your research will have.

Informed consent does not mean that a study is without risk or emotional impact to a participant. Those risks should be anticipated and mitigated, but as my anecdote will demonstrate, that sometimes doesn’t happen like it should.

When I was about 19 years old, I was enrolled in a typical intro to Psych class at my university. As part of the learning experience about experiments (and as a way to drum up volunteers), I was required to participate in something like six hours of experiments. You didn’t actually have to do the experiments, but you had to show up and opt out on the informed consent document to get credit. I can’t remember if there was an alternative if you didn’t want to go at all (maybe write a paper), but nevertheless I had a positive attitude and felt like I could help researchers solve important problems if I participated. Most of the studies were just multiple choice quizzes or writing answers to timed questions. One study was a cooperative brainstorming task that I did in front of a one-way mirror. Nothing too outlandish.

Then I participated in a study that was not so pleasant. I showed up to the study room, where I was told that I would be watching videos with group of two other students, and I would be asked how I felt about the actions of different characters in those videos via a form. The films they showed were all of people getting verbally ridiculed, then getting angry and beating someone up. I think they were all Hollywood motion pictures (one was definitely Dazed and Confused), but I can’t remember. The questions on the form were about whether I thought the person was justified in attacking someone.

During the screening, a confederate (unknown to me at the time) offered me some unwrapped candy out of a bag. We’ll call him Person A.

During the screening, a confederate (unknown to me at the time) offered me some unwrapped candy out of a bag. We’ll call him Person A. I politely refused person A because I thought it was really weird to eat unwrapped candy from someone I didn’t know, and I’m not a big candy person to begin with. After the experiment was over, another confederate (again, unknown to me) stopped me in the hallway while I was walking out. We’ll call him Person B. Person B pointed my attention to a disc on the table that Person A had ostensibly forgotten. The label on the disc said “Final Paper.” I asked person B if he knew person A, and he said no. He then told me that Person A told him he would be going to a meeting in the basement of the building I was in. I told Person B that I would take the disc down to him, and Person B followed me into the elevator. I probably should have been more suspicious of all of this, but I figured that once I walked out of the lab room into the hallway, the experiment was over.

On the elevator, Person B asked me if I would pledge to donate to his AIDS walk charity. I didn’t have two nickels to rub together in college, but I said I would since I felt bad. I put my donation on the form and Person B got out at the ground floor.

Wait for it, it gets even weirder from this point on.

I got to the basement and I couldn’t find the room, so I asked a custodian who was mopping the floors if he knew where it was. He told me, in broken English, that the room didn’t exist as far as he knew. Afterwards, I thought I would walk around one more time just to be sure. It turns out I had just passed the room and not noticed it since the lights were out and no one was there. There was a note on the door that said the meeting had been moved to a room on the top floor of the building. I was a bit angry that I had to go back up to the 12th floor (or whatever it was), but I got back on the elevator.

When I got out of the elevator and walked to the room, Person B was waiting for me around a corner in the hallway. At first, I couldn’t figure out what was going on, and I was a bit disoriented. Person B told me that everything I had been doing for the last 10 minutes or so after I left the lab room was an experiment. Apparently the candy Person A offered me was somehow related to the violent movies, the disc left by Person A and whether I returned it was related to him offering me candy, my willingness to pledge to Person B’s AIDS walk was related to him encouraging me to return the disk to Person A, and moving the meeting to the top floor was testing how far I would go to return the disk. All this was in addition of me answering the questionnaire.

However, I wasn’t simply told all of this in my “debriefing,” I had to ask whether some of the scenarios were part of the experiment. I asked about the custodian and Person B said “What custodian?” I asked if the disc really did belong to Person A, and he said “you can just give that to me.” I went back and forth with him a couple times to make sure, because at that point I really couldn’t sort out all the different components of the experiment.

I would say I’m no more paranoid than your average person, but I was extremely uncomfortable in that moment. I was given more consent documents to sign by Person B, which I did because I wanted to leave as soon as possible. Also, even though I was told (probably repeatedly) that my participation wouldn’t affect my course grade, the experimental design was confusing to the point where I didn’t know if I could opt out at that point or even how many experiments this counted for towards the course requirement.

I would say I’m no more paranoid than your average person, but I was extremely uncomfortable in that moment.

Looking back years later, I was probably naive in thinking that the events after I left the lab room were separate from the experiment, but I let myself be fooled in the moment because I’m naturally trusting and wanted to help a fellow student out if I could. The experiment played on my disposition in that regard.

But wait, it gets better yet.

I asked Person B if the experiment was over now, and he said “Oh yeah, we’re all done here if you want to take off,” or something casual to that effect. As I left the building I paused in the lobby to cue up my CD player (dating myself here). Whether by design, or by unfortunate accident, Person B happened to exit the building right after me, and even walked in the same direction for two blocks. I know this because I kept looking over my shoulder to see when he would leave.

When I got back to my apartment, I would say that I was significantly emotionally impacted. I kept replaying the events of the experiment in my mind. I started wondering if the custodian was a secret confederate, and his broken-English explanation that the room didn’t exist was to see what my attitude to an ESL speaker was. I also wondered how Person B knew to wait for me on the top floor for my debriefing. What if I had just said “F**k it” and left the building with the disc. I concluded that there could have been someone silently observing from the darkened room I stood next to. What if I had accidentally discovered that observer? What would the emotional impact have been of discovering someone surveilling me from the shadows? As stupid as it sounds now, I even thought about the custodian calling up on a walkie talkie, saying something like “the package is en route.”

As stupid as it sounds now, I even thought about the custodian calling up on a walkie talkie, saying something like “the package is en route.”

I had a strong sense that even though Person B told me the experiment was over, that it was still going on in some capacity. Later, before I took the final exam in the course, the professor told us that the course itself was an experiment on college learners and asked us to sign informed consent documents. This was minutes before the exam started!!! As you might guess, having participated in this bizarre experiment, my suspicions about the experiment never ending were only heightened at the worst possible moment: right before I had to take a two hour long multiple choice exam.

I probably could have complained about the study, but I didn’t really want that kind of attention. Whether or not a complaint could have actually impacted my grade, I had perceived negative repercussions associated with making a formal complaint. I was compromised as a participant both during and subsequent to the secondary experiments outside of the lab room, because I didn’t feel like I could opt out.

This study was approved by my university’s IRB.

My point in sharing this is not to disparage human subjects research or the IRB system. I’ve come to think that the experiment was probably much more structured on paper, but was executed poorly. It’s possible there was a more structured protocol for debriefing, which was not followed in my case. Nevertheless, the sole fact that this study received IRB approval doesn’t mean that it should have been done, for a few reasons:

  1. The experimental design was shit. Embedding so many sub-experiments in the primary experiment meant, ultimately, that you couldn’t infer a damn thing from any of my actions past (I would say) the point where I agreed to return the disk. Even that action was primarily due to my empathizing with Person A about losing a term paper, and had nothing to do with any candy offers.
  2. Debriefing would be so complicated, that you have to wonder why they grouped all these sub-experiments together in the first place. I should have been made to understand the totality of the experiment and the ending conditions clearly before I was allowed to walk out of the room (or given something to read that contained that information at the very least). I definitely should not have been debriefed by a confederate, someone who knowingly deceived me during the experiment.
  3. The conditions have to be so carefully maintained, they make this experiment an incredibly complex machine that achieves very little. Having the confederate/debriefer/whoever the hell he was walk out of the building and follow me was idiotic. Person B should have gone out another exit or even waited ten minutes before leaving.

Even though I don’t technically think my rights as a participant were violated, and I’m not significantly affected by the experience now (other than it’s a funny story to tell at parties), it was seriously disconcerting at that time. I was made to feel unsure of my privacy at the university for a least a couple of months. I felt observed, and it was a feeling that took some time to get over.

As it relates to the Facebook study, I can totally empathize with people feeling like they were toyed with, and being told the effects were minimal does not do much to dispel that feeling. The reason we obtain informed consent and avoid using the word “subjects” is exactly to remove the detachment that makes researchers feel like those people are the other, the thing to be manipulated and run through a maze. We’re careful to distinguish that we manipulate conditions and observe responses, but it’s naive to think that you can design an experiment with such minimal impact that the participants don’t need to be informed or debriefed.

Emotional impact, even if it’s negative, is just a part of an experiment. Some of those other experiments where I answered multiple choice surveys repeatedly asked strange questions, like “Do you ever feel like the television is talking to you?” or “Do you ever feel like your limbs are detached from your body?” I wasn’t disturbed by the questions, but they were strange enough that I wanted to know why I was being asked them. Most of the time I got a debriefing statement (I think the test I mentioned was for schizophrenia). I was exposed to a slight emotional impact, but it may have helped doctors better diagnose and treat someone with serious problems. I think most people, if the impact truly is small and they are aware of the type and duration of the experiment, have no problem participating if it can help someone who needs it.

The IRB is supposed to help us define how to run human subjects research responsibly, but, as boyd suggests, we all need to think more about the actual execution of the research and what responsibilities (outside of just legal and IRB) we have to participants.

Facebook and other social networking sites shouldn’t stop doing research or publishing it, but they need to be more forthcoming to users. I don’t think informed consent is always the answer, but FB could have had a press conference where they clearly explained what they had done, why they did it, and what contribution it made to society and our understanding of human behavior. They should have sent a notification to all of their users, even those who didn’t participate. boyd even goes so far as to suggest that users should have a hand in determining what types of research Facebook does, but as we learned from the final site governance vote ever, that is probably just a fantasy.

Finding social networking site accounts for a list of organizations

I’m working on collecting data for my dissertation right now, and one major problem that I ran into was finding organizations on Twitter and Facebook. I have heard from more than one person who has a list of organizations (say the top non-profit organizations or Fortune 500 companies) and they want to make a collector to get Twitter posts, but they don’t have the usernames for those organizations. Twitter lists are great for finding lots of accounts, but there are two major problems: 1) The list you need may not exist, and 2) The accuracy and currentness of that list are wholly dependent on the curator. If you are concerned with getting the most accurate sampling of a group of organizations on social networking sites, chances are you have to make your own list.

I first encountered this problem when I was compiling Twitter lists of members of the U.S. House of Representatives and U.S. Senate in the 113th Congress. At the CaSM Lab, we use these lists to collect tweets authored by and directed at members of Congress (MOCs). To compile the lists, I had to do a Google search with the name of the MOC, plus the words “Congress” and “Twitter.” While adding these terms (usually) weeded out people who coincidentally had the same name as MOCs, it did not weed out MOC’s Congressional information pages and well-meaning websites like Tweet Congress. Even after a focused search, I still had to scan results, verify an account, and copy the URL or username.

For my dissertation, I am pulling from an initial list of 2,720 non-profit organizations that potentially have SNS accounts. Manually performing a search for each organization and extracting a potential URL for each organization would take far too long. Since there is some degree of human intelligence involved in such a task, paying someone to perform the searches and find URLs would seem to be the only option. Since this is a dissertation, however, I have approximately no funds allocated for this. Likewise, I wanted a method for finding URLs that works on a variety of projects so that I don’t have to pay someone every time I need to make a new list.

I had some previous experience with Ruby and the Watir gem so I chose that route for automating the search task. Watir is an application that allows you to automate a web browser to pass information to a website form and monitor the results. It also has some limited scraping abilities, which is perfect for scraping structured information such as search results or tables.

My initial script grabbed the first three URLs from a Google search indescriminately, but that caused a couple of problems. First, for organizations that return more than one page from their website in the top of Google results, you risk crowding out relevant social networking site URLs (a problem of recall). Second, the script returned lots of URLs from third-party non-profit information sites that had dummy entries for the organizations I searched for (similar to the TweetCongress problem). These non-relevant URLs lowered the instrument’s precision.

Unfortunately, since I wanted to start the Twitter collector immediately, that meant I still had a large amount of searching and scanning search results when collecting Twitter URLs for my study. For collecting Facebook URLs, I decided to return to the search script and fix these problems.

I recently finished a revised script (available on Github) that returns the first ten URLs for a given search term when the URL matches a predetermined string. In order to increase the instrument’s recall, I expanded the number of URLs it collects from three to ten (the number of URLs on the first page of a Google search). To increase the instrument’s precision, I changed the script to only collect URLs containing a given string (e.g. “”). These changes greatly increased my confidence that when the script returns zero URLs for an organization, there are no social networking sites associated with that organization.

While this script doesn’t replace the need for human verification, it does eliminate the tedious process of performing initial searches and having to pick through the results to find a potential URL. There is certainly a chance that I’m missing a few accounts by using automation, but, as I learned when searching for MOCs, fatigue is equally as likely to result in a false negative as any automation.

Feel free to try the script out and if you do, please let me know how it works for your searches. It’s pretty versitile and can be adapted to most any search task where you need to find URLs for a list of people or organizations. Also, although I haven’t done so, I’m sure it could be modified to work with Ubuntu or as part of a Rails app. Its only limitation is that the available memory limitations slow it down after about 1,000 searches (a problem I don’t have time to investigate now).

Also, if you are looking for some introductory help on using Watir to automate a web browser, I have a tag on Diigo with links to some helpful resources.

Clever Tweetbots

Earlier in the week, I played this Radiolab segment “Clever Bots” for my students in my summer science fiction course. We were discussing artificial intelligence after reading Neuromancer by William Gibson and the segment discusses robots that approach what Alan Turing described as the threshold for intelligent machines (the ability for a machine to converse with a human and for that human to be unable to distinguish between the machine and another human around 30% of the time–the “Turing test”).

I was discussing language games and computer programming with Nicole earlier today. I was telling her about the interview in the above segment of Sherry Turkle, a professor at MIT, concerning a program named ELIZA: a language game program ca. 1966 that used natural language processing (NLP) to mimic the role of a therapist practicing Rogerian psychotherapy (a form of talk therapy), only a little too closely for the creator’s comfort. Reportage on the program speculated that people would go to phone-booth-like installations to receive therapy rather than a human psychiatrist. The creator, Joseph Weizenbaum, was greatly disturbed by the artificiality of this type of interaction.

When I saw James Schirmer, professor of English and prolific tweeter playing with the app That can be my next tweet, I had to give it a shot. Apparently it is a kind of language game that searches your past Twitter posts and assembles fragments of each post based on their parts of speech (presumably using NLP) into a semi-coherent and incredibly hilarious melange of random babblings. Only about one third of the tweets make any sense, but here are some of the funnier tweets the app generated, and I posted:

And my favorite, which I admittedly modified slightly by omitting a few random letters and characters at the end: