Web scraping workshop report

web scraping

A couple of weeks ago Iwas delighted to able to invite Chris Hanretty up to Durham to deliver a web scraping workshop. I’ve already blogged about how happy I was with the popularity of the course, it filled up in a matter of hours, but I on the morning I guess I was still nervous to see who would turn up and what they would get from the day.

On the day 25 people turned up, and there was a big variety of people. There was people from Law, Anthropology, Education, Psychology, Archaeology and Modern Languages amongst others, as well a couple of STEMers. There was also a wide range of computer experience, from people who already did some coding, so those who had never seen html tags before. Chris’s materials and teaching style were very well recieved and by the end of the most of the participants were working on real scripts scraping stuff off the web, a massive achievement! I happened to bump into one of the anthropologists who was there afterwards who had told me that he was fiddling with some web scraping code the following night, which was great to hear.

I think there is a real appetite for this kind of stuff at Durham. One participant commented that they would like a similar course looking at Visual Basic, which I think could be really useful too, although I think ‘real’ coders turn their noses up at it. I wonder what other ‘low hanging fruit’ there are that could entice non or less technical minded people to use simple coding scripts to help make their working lives easier??

Posted in Uncategorized | 1 Comment

Tracing a false fact in wikipedia

There’s a college at Durham University named after  campaigning feminist Josephine Butler, and so one afternoon I found myself on wikipedia looking her up. I was quite surprised to read the following ‘fact’


I happen to be from Southend. Royal Artillery way (or to give it it’s less fancy name, the A1159) is a short stretch of dual carriageway. I looked it up on google maps to check, no museum there. I googled it more broadly, no mention of the museum (except for the wikipedia page, and people who had cut and pasted it (a total of 77 hits!!). Most worryingly there is no link to any evidence for the museum’s existence on the wikipedia page, which is supposed to be one of the rules of wikipedia.

So after a bit of faff (the IP address of my office desktop was blocked) i managed to log in to wikipedia and try and find out when this misinformation was added. It was added on 20th October 2008, just shy of 5 years ago, and the original edit included some more information about the museum.


Now this is more obviously untrue, lazers is misspelt for a start, but a few weeks later somebody with a different account name, embellishes it further


This was obviously a step too far for somebody, and a couple of weeks later it was edited back to the ‘fact’ that i stumbled across (and have now deleted) nearly 5 years later.

I think this raises some interesting questions about wikipedia in general. Why did the original person only delete half the false fact and leave half? how did this unlikely fact last so long? Was it left alone because it sounded vaguely plausible once the more puerile stuff was deleted? How many other plausible untruths are there on wikipedia? And perhaps most importantly, why Southend? Any other town and i wouldn’t have checked and who knows how long that ‘fact’ would have lasted?

Posted in Uncategorized | Leave a comment

Web scraping for arts, humanities and social sciences workshop

Web scraping is a versatile tool for taking data from websites and putting them into a spreadsheet for analysis. The potential is there to get a large amount of useful data, which can then be analysed. For example I’m thinking of scraping the IMDB database to look for zombie films, when they were released, the box office takings etc. to see if there are any interesting patterns over time. I did something similar to this manually in the past, with web scraping it can be automated and I can tinker with the parameters easier (e.g. what counts as a zombie film?)


As part of my SSI fellowship I’m delighted to be inviting Chris Hanretty up to Durham to deliver a workshop on webscraping, something he uses in his politics research.

The workshop is on July 1st, and all 25 places were filled within hours of the workshop’s announcement, which is a great indication of the pent up interest in this tool. I even got a couple of emails from people who couldn’t make it, but were really supportive of the idea. I’d be really interested to hear about the different potential applications across the arts, humanities and social sciences. So much stuff is online now, the potential for web scraping is enormous!

Posted in Uncategorized | Leave a comment

Special Research Environments and Spaces – ATC, Swinburne University of Technology

What makes a great research environment? What kinds of technological, social and physical factors support interdisciplinary collaborative research? This is the focus on the  EPSRC funded spires project. As part of my trip to Australia later in the year, and thanks to a little bit of extra dosh from spires, I will be visiting the Advanced Technologies Centre at Swinburne University, Melbourne.


It’s a pretty snazzy building, and the press release from it’s opening a couple of years ago makes some bold claims.

“Researchers from a mix of disciplines have moved in and will be conducting their work behind transparent walls. According to Dr Andrew Smith, Director of Swinburne’s Facilities and Services Group, this approach was taken to ensure that the university’s intensely technological endeavours were on show and not hidden in a back lot.

“It is about the university inviting the public to gaze in and participate for a moment in Swinburne’s long-standing love affair with the technology needs of industrial and post-industrial societies,” he said.

“Alongside the laboratory are social and teaching spaces, […] Interlinking the cluster of buildings is a series of hanging bridges, cobbled alleyways and landscaped lanes that give the feel of inner city Melbourne”

I’m hoping to visit the building for a couple of days in early August, have a look round and speak to a few people. It will be interesting to see how the building has bedded in and how effective it has been in creating a great research environment.

Although at the moment I’m still not particularly sure what a great research environment is, especially for social scientists? I’m thinking that a pub is a good starting point!


Posted in Uncategorized | 2 Comments

Sociology and Software


As part of my SSI fellowship I’ve started to have a think about the role of computer programming in sociology. I thought I’d blog about them and invite comments from the wider community!

Sociology is a pretty broad discipline, at Lancaster the department included a number of anthropologists (e.g. the excellent Lucy Suchman  ) while here at Durham the sociology department is more quantitative. It’s actually quite hard to imagine an anthropologist in the Durham sociology department, or a quants person at Lancaster’s (with apologies if I’ve overlooked anybody who is!).

A future blog post will look at anthropology and coding, so for now I will limit myself to (the rest of?) sociology, which still includes theoretical and empirical (qualitative and quantitative). Clearly sociologists will use a variety of tools, especially in their writing.

I’ve heard about Professor Bob Jessop’s personal handcoded database for managing his reading and references which he’d developed decades ago, although use of Endnote is more widespread (my brief flirtation with Mendeley a few years ago seemed to suggest that there weren’t many social scientists using it at the time).

Qualitative tools such as NVIVO NUDIST and ATLAS are used throughout the community to code qualitative data (although this isn’t the same as computer coding, which does lead to some confusion!), and quantitative tools such as SPSS MATLAB and R are commonly used for Quant data. Each of these has a coding interface which is much more similar to computer coding.

The question I’m left to ponder though is who is developing these tools? Is development driven by the community? And are there coder/ sociologists developing new and interesting tools that are pushing the boundaries of what sociologists do? I’m aware of some e-social science projects, although that never really felt that driven by the wider sociology community, and i’m not sure of the ongoing impact…

I feel that this is quite advanced in areas that are using online data but less so elsewhere. In fact I hope to put on a workshop later this summer about web scraping which will promote social scientists to work with online data scraped from the internet.

It would be great to hear about examples of sociologists who code, either in their own time or as part of their role, especially if this is looking at data and areas which aren’t wholly online already.

Posted in Uncategorized | Tagged | 3 Comments

Sociology and Public engagement, ESRC festival of social science

The ESRC festival of social science has recently put out another call for proposals for events this November. There’s up to 2k available for events which appeal to non-academic audiences.

In the last couple of years I’ve been successful in getting support for a Zombie film screening at the Newcastle Lit and Phil, and a night in a pub exploring pub quizzes

In each case the event was a great opportunity to meet and connect with people that I would not have engaged with otherwise. The Zombie event attracted lots of zombie fans and helped with a zombie related book chapter I have written (the book is out soon! check it out, Zombies in the Academy). I also ended up being interviewed by various media outlets, which was a great experience.

The pub quiz night was good fun, and was featured as an ESRC Podcast, and that was a great experience to be a part of. I still think there is a Radio 4 program there!


At the foundation centre I work with colleagues from a range of disciplines, including quite a lot of natural scientists. They are actively involved in lots of public engagement type activities such as Durham’s Celebrate Science which is a great event (I even volunteered!).

I think that as social scientists we have an especially great incentive to engage with the public, they should be interested in what we’re working on. I teach introductory sociology classes and it’s a great feeling to be the person to introduce sociology to a new cohort each year. The ESRC festival of social science is a great opportunity to promote the range of social research that is undertaken across the UK, I’ll be applying again this year, and so should you! (I’m happy to respond to any queries if I can).

Posted in Uncategorized | Leave a comment

Student media focus groups – some notes

As part of my Pinterest project (blogged about previously here and here) I carried out a couple of focus groups this week with my students to explore their use of social media in general, their use of these tools in their learning and the pinterest resource that was created to help their learning on my Anthropology class. I thought it would be a good idea to blog some early thoughts from the process as it might be of interest to others, and it will be a while before it all gets written up and properly disseminated.

I should probably make a couple of methodological comments. I carried out two focus groups (with 6 and 7 participants) from students who had previously taken my anthropology class last term. In each focus group the students were all from the same class and knew each other well, which really helped the discussion flow. Whereas for my previous youtube project I used somebody unknown to the group to conduct the focus groups (my collaborator @elainertan) this time i decided to moderate the groups myself. This was  a lot of fun and I think the groups went well (I already had a good rapport with them from teaching them the previous term), but it’s worth mentioning as if students had any negative experiences they might have been less keen to offer these views.

The first striking thing was that I asked to tell me the range of social media and online tools that they use, and there was a massive range! Aside from the usual suspects (facebook and twitter, although neither was universal) there were some suprising blasts from the past with MySpace and Bebo getting mentions. There was also some mention of country specific networks such as RenRen and Weibo (I’m working on a project looking at weibo which I’ve blogged about ). There was also a couple of networks which I’d not heard of such as Xanga and the Japenese mixi.

Another suprising (well to me at least) finding was the popularity of mobile based apps. Instagram got a brief mention, but both groups talked a lot about Whatsapp which was used to keep in touch with people, sharing text and photos for free (the free bit was mentioned a lot). The anthropologist Danny Miller recently blogged about the popularity of Whatsapp in Trinidad suggesting that trinidad is ahead of the game. Perhaps Danny and I are just late to the whatsapp party? anyway it’s always good to hear about new tools that students are using.


These social media were being used to help with their learning. Youtube got a few mentions in this regard, and online forums as well. Even our much maligned VLE (Blackboard) got quite a few positive mentions! I know!

For the purposes of my project, and research interests, the really interesting stuff was all the stories of how students shared the stuff they found. One student had friends at other universities who would send him articles that they found, other students would share videos and webpages that they had found, but only with close friends. Another student told how they would take a picture of the whiteboard for a student who was absent (yes there would have been an interactive whiteboard in the room, unused! *this wasn’t me*) and share it with them, but not with anybody else, this hadn’t occurred to them.

This got me thinking about how we might be able to encourage this sharing that is already taking place, and try to help students share these resources with the whole group. One way to perhaps do this is to encourage sharing to take place within the classroom (possibly through the use of their smartphones)? I think that this idea has potential.

The pinterest resources were very popular, especially during revision, although i should probably treat this enthusiasm with some scepticism as ‘the pinterest project’ was paying for their lunch! The students liked the way that the resources were grouped together by topic, and included videos and websites etc. They also liked the visual nature of Pinterest. Interestingly though they didn’t really interact with Pinterest as a social media, so there was little (if any) repinning or commenting on the resources – which I guess feeds into the discussion of the previous paragraph.). This also reflects my experience of student use of slideshare (where I post all my lecture slides).

Anyway I will do some more analysis and write this all up into some kind of report/ paper, but I’d be interested to hear any comments or responses.

Posted in Uncategorized | 1 Comment

Weibo use by UK universities

I’ve blogged in the past about the potential of chinese social media for our recruitment and marketing. I recently managed to commission a survey of Weibo use by UK universities (carried out by @yimeizhu ). This will eventually be worked up into an article of some sort (plenty of interesting issues have been raised), but I thought it would be a good idea to blog some of the results while they are still current, and get some feedback.

This study involved the manual search on Weibo for an official university presence in the period 27th-29th August 2012. The list of institutions was taken from HESA and a similar study carried out looking at institutions’ presence on other social media sites. This list was matched against the mission groupings as they stood on 6thSeptember 2012, this was manually done through a comparison with the different groupings websites.

Out of 163 UK universities 94 (58%) have some sort of Weibo presence, of these 41 have verified accounts. There are some differences amongst the universities. 64% of universities in Wales have weibo, in Scotland the figure is 53% while in Northern Ireland, only one university has weibo site (25%). Turning to the different university mission groups we can see quite a wide variation in table 1. [this is a shortened version of the table as extra columns (for GuildHE (17%) and nonaffiliated (39%) ruined the formatting in wordpress)

Mission group

Russell Group


University Alliance

Million +

% with Weibo account





mean followers


















Institute of Education




East London

  Table 1. Weibo account by mission group

Once an account has been set up it needs followers to be of any value. Each follower has made an active decision to engage with the institution’s presence on weibo. They may be prospective, current or past students so there are potentially large numbers of followers available to each university.

  Frequency Percent
0-50 followers 20 21.3
51-500 followers 25 26.6
501-1000 followers 11 11.7
1001-5000 followers 31 33.0
5001+ 7 7.4
Total 94 100.0

Table 2 Followers on Weibo

This table gives an idea of the popularity of these accounts. Of particular interest are the accounts with large followers. There are 31 institutions with between 1001-5000 followers and 7 with over 5001. The institution with the largest number of followers is the University of Huddersfield with 30,469 followed by the University of Central Lancashire (25,442) and Kingston University (16,025). Huddersfield is in the University alliance, whereas the other two institutions are part of the million+ group.

Having an account is just the first step, however and table 1 gives a breakdown of the level of activity.

  Frequency Percent
everyday or almost everyday in the last month 16 17.0
frequent & have been tweeting recently 38 40.4
not very often (but has recently) 5 5.3
haven’t recently (nothing in last 2 months) 17 18.1
a few tweets in total (less than 5) 11 11.7
no content 7 7.4
Total 94 100.0

Table 3 Frequency of posted messages in the last month

We can see from table 3 that whilst 18 institutions can probably be categorised as dead/dormant users 54 clearly have some sort of regular interactions with Weibo. The three institutions with the most followers are all daily or frequent users, as you might expect.

So overall the survey found a pretty high level of use of Weibo by UK universities, although there is a big range in the level of interaction. This is clearly an area worth exploring from an admissions and marketing point of view, although there are a few questions remaining.

How do chinese students (and there parents?) use Weibo when thinking about which UK universities to apply to? Are the sites used by current students or alumni? Are there interactions between these groups? Is wiebo used in similar ways to other social media sites (such as a facebook page, which we set up a few years ago?). What are the intercultural issues for UK institutions in establishing and maintaining a presence on weibo and other foreign social media networks?

Posted in weibo | Tagged , , | 2 Comments

Zombies in the Academy Book

Just received notification that the Zombies in the Academy book is slightly delayed, but due to be published in early 2013. This is quite a long time after my first blog post about our chapter, in February 2011!

The book now has a page on the University of Chicago Press site, which shows off quite a funky front page!

The contents look really interesting, and I can’t wait to get my hands on a copy, although I did notice that Margaret Attwood is now writing a zombie novel which might suggest that zombies have jumped the shark.

If zombies are on the way out, I wonder what will be the next metaphor to go viral?

Posted in Zombies | Leave a comment

Could Weibo overtake Twitter?

Most people think of Weibo as a ‘chinese twitter’ (if they think about it at all), but what if it made a play for the global market? Does it have the potential to overtake the original?

I’ve been working on a project (with @yimeizhu ) looking at the use of Weibo by UK universities. I’ll blog some of the results of that in the near future, but one upshot is that I’ve been quite impressed by some of the features of Weibo and rather than thinking of it as an inferior copycat I’ve started to wonder whether the features commented on as weakness (close control/ censorship) could make for mass market global appeal.

If you look at the most popular university weibo site, for Huddersfield University you can see that it is a mixture of english and chinese, although mainly the latter. They currently have over 30,000 followers (compared with an average for the rest of the University Alliance of 2,900, and Russell Group and 1994 group averages of 1,500 and 2,000 respectively).

You can see that pictures are embedded within the page in a way that isn’t currently default on Twitter (although there are firefox plugins for that) and weibo supports animated emoticons in a way which I find quite cool, but which most likely split opinions.

Screenshot of Huddersfield University's Weibo page

The bigger differences are less obvious.

Firstly the content on weibo is closely monitored. The focus in the west has been on the policital nature of this, but this monitoring presumably also focuses on other elements such as pornography and spam.

Seconldy most weibo accounts are verified (in a way that only celebrity accounts are on twitter) which presumably makes spambots a lot less likely.

These two features taken together remind me of Apple’s app store, which has of course been hugely successful with consumers who appear to be happy to trade off an element of freedom for a dependable and safe product.

This leads me to wonder what might happen if twitter becomes increasingly clogged with spambots and Weibo were to agressively expand into non-Chinese markets. What if Weibo’s success in China isn’t just because of the great wall of china (which can be circumvented fairly easily) but becuase it’s a good product? Could Weibo overtake Twitter?

Posted in Uncategorized | Leave a comment