Saturday, February 11, 2006

Dave Sifry and Tim Bray at Northern Voice

Dave Sifry and Tim Bray

Prepare for a drastic decline in the spirituality quotient and interest quotient as well. (Applause)

Would like questions from you so we know what you care about. I’m Tim Bray, I was here last year, this is Dave Sifrey from Technorati. If you are a blogger and don’t have a Technorati ego feed you need one.

T: Dave, why does It exist? Why do we need to do this?

D: Explain why I built Tehcnorati in the first place. Jan 2002, just left a company, Linux Care. I had this mailing list, CTOalerts. My prerogative to sign everyone up to it and they had to read it. I would pontificate. What was terrific was the feedback , comments, arguments, perspectives from readers. When I left, people asked if I’d keep doing the alerts. I enjoy writing it, but a) I’m not the CTO any more and b) mailing lists suck. Anybody here who has had to manage a mailing list it is a huge pain. Subscriptions, moderation, spam, vacation messages, sucks. I started looking around at what else was out there. Gotta be another way. Kind of a geek. Know how to set up software, think in Perl. Here’s some free software, sourcecode out there, can set it up on my server, create a dynamic web publishing. Blog. Whatever. The feedback was soenourmous. I immediately became a stats whore. What people were saying about me. Enough about all of you. Who’s linking to me. I want to know more about peoplewho are commenting about me. A social thing. As human beings. That is what was wonderful about what Julie was saying. We tell stoires. When I spend four hours writing a long blog post to hear from someone inJapan or Brazil who it touches. Throwing bottles into the ocean. Even if they think you are wrong. If they write back they have something in common . You would be surprised how many write about knitting. I’d go to google and yahoo. Problem was, it comes down to worldview. IN the way search engines were built. How many know how SE work? 50% raise hands. IN essense it is built on the model of web as worlds biggest library. Info retrieval. Where web came from… digitized documents from libraries. Yahoo built a directory. Even the language we use when we talk about the web, it is the language of the library. Web pages. Documents. Indexes, Directories. Language of the library. What I realized was happening… so google and yahoo were all too slow. Could not pick up the immediacy of the conversation, thinking about the web in a different way. People interacting with each other. Noneof these SE were taking this into account. Fundamentally has to do with how they are built. Don’t understand the concept of time. You go to google and ask what happened on 10/15/205 it can’t tell you. Google news/Yahoo news – periodical section of the library. Instead of 400 papers, it is 4000 updated every 15 minutes, but still the same idea.

What if you could shift the metaphor, the web as library (which is powerful – stil use google and yahoo every single day. Fantastic search engines. Virtuoso operation.) When you shift towards where this comes from. Docs are created by people. People create these things at a given time. What if you build something that understood not just keywords and hyperlinks. Before google, we just though about indexing just keywords. Google really capitalized on the understanding that hyperlinks are votes of attention. Actually we don’t have to use AI, we can use links. Hundreds and thousands of people, when they link,that shows relevance. But still based on library. PAGE rank. Web in terms of pages. What I realized pages are created by people. Think about blog in a new way. As not a reverse chrono website with comments trackbacks andpermalinks, but instead built something that understood a blog as exhause of a person’s attention over time. Their attention stream. We’re leaving our little droppings along the way.

We’re not actually asking people to do anything different. I read their blog. I understand a lot about them. Tim, I read your blog I understand more about you, what you think who you respect or disagree with via links. What you spend your time on – what has importance. You can step and do the Ebay thing. The people who link to you actually help you understand. Links to others helps understand. Relevance à popularity. What if you took that idea to people rather than pages. Number of people linking ot you as measure of reputation and authority. Does not equal veracity. Lot link to Drudge. If you follow politics and don’t look at Drudge you are missing out on a lot. I built it because I wanted to know who was talking about me.

The funny thing was that a whole lot of other people wanted to know who was talking about them. Who linked, who cared. If you know something about people and time, there is a new set of apps we can build looking at the web not as a static thing, but as a living thing. What Doc Searles call the World Live Web rather than the World Wide Web.

T: Rather than speaking of stats whore you are in the midst of the “state of the blogosphere” report. Crunch numbers and tells us how many of us there are, how many are coming, how many left. A whole lot and growing fast.

D: It’s pretty interesting. Technorati is tracking 27.6 million blogs today. Wedon’t pretend to say we are tracking all of them. We work hard to get all of the public blogs into the index. Obviously I can’t extrapolate to the untracked, in particular Korea – lot of blogging going on, but structure, three big companies, we’re not tracking Korea as well. We have found of 27.6 millsion, the blogosphere gowing 75,000 new blogs every day. 86,400 seconds in a day. About one a second of every single day in the world.

The interesting thing to note, how many are still actively blogging after 3 monhts. Post tire kicking or service experimemtation. Just over 50% are still blogging after 3 months. That’s about 13.7 million people blogging at least once every thee months. 11% once a week or more. 2.8 million. Daily or more – something nice – just under a million. Clearly the posting volume is a much more interesting indicator than number of blogs. Tracking 1.2 million posts/articles created every single day. 50,000 posts per hour. Used to talk about a 24 hour news cycle. Moves fast. If you werne’t in. We have changed the way we look at they cycle. Measure in mh not in hours. 15 posts a second.

How do I make sense out of all of this? Some people might be daunted by the numbers. How can I get noticed. So interesting and exciting. As a type of media as opposed to radio, TV, papers – incredibly many to many. We talk a lot about this at conferences. I was charting this authority curve. People like to talk about the long tail. We took a listing of the top bloggers and started carving out this tail and printing it out. There’s this, the people at the top, instapundits, in effect from a media perspective they have become very similar to MSM in the habits. One to many. Can’t respond to all the posts. Boing Boing turned off comments. They look like CNN. The amount of links they get is enormous. We know about the long tail. People posting personal thoughts, local school board. There’s about a 115,000 people who are in the magic mil. Between 30-1000 people linking to them. Exciting part. Haven’t talked about. A lot of the people in the room. Influential, authoritative in niched areas. Tech and gadgets, fashion, gardening or people writing about local communities. Lots posting about school boards. They become local authorities. Interesting. Becomes very 2 way. The amount of traffic they see is manageable to conduct a strong conversation. Authoritative enough. How can we help to expose these people more, to get found better. If you decide I want to start writing about Vancouver, how do I do that?

Sept 2005 launched a blog directory service called a blog finder. Not to hire librarians with a taxonomy, to hand it back to the bloggers. Let you tag yourself. At the same time, not just take your word, I could say I’m a fashion blogger, but right. What if we could look too see ifyou wriate about what you said you write, and also compare to those who also write about fashion and how to they link to you. Many links, high topical authority. We open this up to the world. What was remarkable was how many people joined in. 870,000 self tagged blogs. From that, there are over 2500 really interesting tag with a critical mass of bloggers writing. Politics and technology are easy. Gardening, fashion, nano tech, erotica, you name it.

T: 2500 interseting subjects in the world. This threshold is close, how many people read you, how busy you are, easy to get over the edge. We become one way.

D: I regret how little time I have to blog. Inverse relationship between employment and blogging. Laughter. I don’t know how Scoble does it.

Q: In terms of building indexs and how to translate tags from multiple languages. TO see who is blogging about things intelligently when tagged in other languages. Anyone working on that?

D: Tagging is relatively sloppy. People use different words. But talking about the same thing. How can I get Rosetta stone in multiple languages. That’s an excellent question.

T: TO a larger extent, tagging as human activity is a sloppy thing.

D: Clay Shirkey has written about this. Here is my direct experience. Question is good in theory. I say rebel you say revolutionary. We talk about the same thing from different view points. What wefound in experience, as long as you make it really easy for people to tag and create accountability in the taggingprocess. Rel tag creates a hyperlink in your post. Not just meta key word hiddenonly for search engines. Make it an accountable thing. If you falsely tag it shows up.
When you set up this system to be easy and accountable, people do reasonably good tags. When you look at the tag system at the whole an emergent system starts to occur. Something greater than the sum of its parts. Automobile for example. I may tag my post auto/car. You may tag bus/vehilcle/auto. Anotehr bus/car/motor vehicle. When you look at those things there is some transitive relationships. Can start to do some stat. Analysis to show word relationship. The same thing happens with languages. All it takes is a couple of people who bridge the gap by tagging in muiltiple languages and the relationship is formed. The Daimler Chrysler guys – in Germany and US. All had different names for car parts. Big problem. There was a small group who spoke both. Whey they tagged in both languages, quickly the system became more intelligent (TELL THIS TO BEV) The English and German word for dohicky were related from this tagging. A beautiful piece of emergent thinking that came out of this sytem. No one had to create a formalized dictionary.

Tim: That is the only hope. What are you worried about this day. What could go wrong?

D: I keep knockingon wood. IT’s remarkable how much blogging has grown. SO far w’eve been tracking since 11/2002. The number of blogs doubles every 5.5 months. (Time: By Aug 2009 everone in the world will have one) Which is why this can’t go on. Everytime I roll this up, they ask will this continue, I say it can’t, and yet it keeps happening. Still at the beginning of this. Enormous challenges. One of them is spam. Comment and track back spam. Splogs. This new thing called SPINGs, flows from the fundamental realization, Cory Doctorow said, all healthy ecosystems have parasites. He’s right. The only question is it going to be like the bacteria in your intestines or red tide. Email spam. The good news. The cool think about blogging, in the end it resolves back to a web page somewhere. Verything you do, write, say is accountability. This idea of accountabilyut is stil why there is a high signal to noise ration in blogosphere. If I say “I think such and such is an idiot.” I can do that. Hit and run on someone else’s blog. On my blog it is on my permanent record. I have to stand up for that tomorrow. That is a powerful force. A very good thing. That doesn’t’ mean we can’t have anonymous or pseudoanon blogging (safety), but I can temper what I know about you with what you have said over time.

Concerns if it goes the other way, a proliferation of spam in our in boxes. A big nefarious one comeingdown the pike. All about network neutrality. Few in room have heard in room. Potentially one of the most dangerous threat to the net.

In the US there’s been a collapse of number of telecon providers carrying the backbone of the net. What you are starting ot see these companies, and we have given them local monopolies to invest in fibre infrastructures and pay monthly, now coming and saying we think we deserve to do preferential pricing so that if you are not on our network you get slower service or pay more. Imagine getting on net, go to Google, you get a pop up that says you are gong off network andyou have to pay more. Or more nefariously go to Google and say hey, you are doing video> You have to pay us to be preferred. Or Bell south decides Barnes andNoble is their prefereed book provider. You type Amazon and a B&N pop up happens. This sort of breakage of the traditional layering of net infrastructure is bad. It protects the winers who can afford to pay the protection money. For those of us building small, young companies can’t pay predatory pricing. We’re stuck at a disadavantage. Hasn’t been talked about a lot. The CEOs of BellSouth, Comcast, they are going up in front of congress saying “of course we need to do this for ROI for fibre investemtns. The only people who can stop it are you. The big companies getting extorted wont’ jump up and down. They aren’t going to come to us to pay, they will go around the back roads to big companies.

Q: (I think Ted Leung);. How many spam blogs?

D: We started tracking spam blogs in July 2004. BaSICALLY two types. People doing key word stuffing, SEO stuff. People scraping other people’s content and trying to make money off adsense. Grown significantly. The only way we can get aroundit is not tech means, but getting down to the economics of doing it. We set up a spam squashing summit (2x) it takes an ecosystem approach. We have a hotline with all of the major hosting companies. We work with other indexing companies. No one company can solve all these problems. Blogger has all of a sudden a huge spam problem with scraping blogs. We get in touch with them and give them the list and they kill them. Talking to the MSN spaces guys. It’s got to be an ecosystem approach. Sometimes competing, but in the end if spam proliferates, its bad for everyonel If it turns into usenet that is bad. The affiliate program guys are the best to work this. They don’t want people violating TOS by posting copywrited material. They can kill that entire adsense ID. You will know their name, SS #, where check is sent to. Stronger motivation to stop. The ones that are harder are the guys doing adult sites, viagra, offshore pharmacies. Combo of technology and the ecosystem working together. Have people participate in that system. To recommend that.

(I’m getting tired typing… may not make it to the end)

T: Those of us with popular blogs see more spam. (Nancy says it happens to small blogs too)

T: I want to know how many people are reading my RSS feeds and want to know. If we are going to build businesses around them, it would help to know. See any hope?

D: Hard problem. There are some companies workingon it but no one has solved. Dick Costello at Feedburnner, Pheedo guys, adserverfs thatunderstand RSS. Interesting when you dig another layer deeper. Not just about RSS. The fact that your RSS readers, when we talk about push, they aren’t really push. RSS reader goes to my site and pulls the latest feed down. If I looked at those RSS stats they may be20 times more than actual because of the frequent checking. There are people who subscribe anddon’t read. Bloglines…

The real question: who is reading, how often and what are they reading. They put little graphics into the RSS feed, like Feedburner. When yo u actulaly open up the item and view it, it goes back and pings feedburner and gets info of reader. But you don’t really know. You could be just scrollingdown. Newsgator pulls in all different posts at once even if you read just one. Certain large application sets that everyone has to code for. Bloglines, Newsgator, NetNewsWire. This is also one of the big concerns about advertising in RSS feeds. How do you understyand these metrics. 2006 will be a good year for this. Like internet advertising beauro, these can be solved at a cross company level.

Q: What about federated blog publishers. People who get bloggers, pay them miserably to blog then pay ads.

D: A number of variants. I won’t name names. Some are more coop oriented. All pitch in, split ad revenues. Different types of networks. Natural outgrowth in shift of publishing economics. Seeing more authors andpublishiners gaining power from people formerly known as the press with high overhead, the question about hiring ad salesperson, getting ad server running. A lot of people don’t have the expertise. So a guild system is starting to form. Natural businessevolution.


I have to stop here. Hands dying. These guys talk really fast.

Categories: , , ,

1 Comments:

Anonymous Denise said...

You are truly awesome. Someone should pay you to attend every blog event and live blog it, tired hands or not. Thank you.

4:53 PM  

Post a Comment

Links to this post:

Create a Link

<< Home


Full Circle Associates
4616 25th Avenue NE, PMB #126 - Seattle, WA 98105
(206) 517-4754 -