Monday, December 29, 2003

Economics of Service Oriented Architecture

Service Oriented Architecture (SOA) is changing how enterprise software is being designed and deployed. Part of the success of SOA is in the technology and is due to the convergence of web services standards creating a common interoperable set of technologies to build SOAs on. The other part of the success of SOA arises from its superior economic model for enterprises. SOA is evolving to the point where new applications will not be deployed as monolithic instances but will become a collection of services woven together in a loosely coupled framework. Applications are becoming virtualized and being made available on-demand this will break the current technological and economic bottleneck caused by the traditional enterprise software model. This problem does not reside with enterprise IT organizations but rather with the enterprise software model as a whole. In most cases though corporate IT is made the scapegoat for the problems with the enterprise software model. Until now there has not been an alternative to the traditional model of deploying silos of functionality now IT organizations have a choice and a way out of the current web of complexity.



The service based model of software is here and forward looking enterprises are rapidly adopting it as a means to streamline operations, reduce costs and provide significant differentiation in the marketplace. Successful companies are investing in services that differentiate the company, and outsource commodity services to organizations that deliver them more reliably, and more cost efficiently. Who today would invest in building an overnight parcel delivery service? Likewise who wants to invest millions of dollars in the design and implementation resources in a CRM system when you can get a service from one of several vendors for a fixed monthly fee that scales dynamically with the growth of your organization? This approach is going to solve several significant problems with todayÂ’s enterprise software when it is broadly applied to the enterprise infrastructure. CRM is just a small example and the tip of the iceberg, when software as a service is used to build and deploy applications the economics are such that it becomes fiscally irresponsible not to consider this approach.



The current model of enterprise software is not aligned with the needs of corporate IT organizations its inefficient, cumbersome, and non responsive to the needs of companies. Measurements of return on investment (ROI) are, in many cases, optimistic at best. The percentage of failed projects by some accounts exceeds successful projects. While there have been and there will continue to be many successful deployments they are the result of careful planning and significant investment of a companies best and brightest IT personnel to ensure success. If the best and brightest are used to ensure successful rollouts - who is creating the business processes and applications that make the business successful? What percentage of the total IT effort is devoted to this?



Apart from the time to deployment and the inherent complexity of the current enterprise software offerings there is an even larger hidden problem. The actual value delivered when taken, as part of the total costs of development of the software is low. Enterprises pay a “portability tax” when buying software, the more complex the software the higher the “tax”. This “tax” is the costs that go into making the enterprise software portable and compatible with the underlying operating systems, databases, application servers, mail servers, directory servers, etc. These costs contribute almost no value to the business objectives of the enterprise yet they can be a significant part of the cost model.



When these costs, of developing, deploying and supporting enterprise software are taken into account the amount devoted purely to ensuring that the software is designed to be run, deployed and supported on a variety of platforms is significant. Based on many years of experience I would estimate that anywhere between 30-70% of the total costs are devoted to cross platform issues. The upper range is for software that utilizes as many native (propriety) platform features as possible and the low end for software designed with a lowest common denominator approach. While this estimate may seem high, consider not just the actual coding efforts but all the other efforts. How much time in the requirements phase is spent discussing and evaluating which combination of platforms to support, which patch releases to support, and then how to design the software to support these features. Every time a new feature is added there is a design discussion on the cross platform impact. Add to this the test matrix and support infrastructure the costs continue to mount, but the value to the consumer of the software does not.


The impact of all this on the customer is that a significant part of the cost of enterprise software provides no significant business value to the enterprise. This has not been questioned, as there has been no viable alternative to this model. Now a viable alternative is being widely deployed through software as a service model.



Software as a service has become viable through three trends that have converged to create an opportunity for enterprises to drive significant costs out of their software purchases, provided faster response to market conditions, and reduce infrastructure. The three trends are SOA, broad availably high-speed connections, and web services. These trends enable software to be delivered as a service – producers of services make services available on a SOA backbone, and then consumers can use the features inherent in the network to create applications by weaving together different services. Applications can now be created that are totally virtual and do not require hardware or software installation or deployment. Companies such as Grand Central are making the network smarter by deploying the necessary services for a SOA backbone. This is aligned with the objectives of corporate IT to build and deploy services that provide differentiation and leverage the services provided by others. The necessary framework to make this happen is not core to any one organization and as such should be shared infrastructure.



Deploying software as a service removes the significant problems with enterprise software as services only need to run on a single platform, therefore a larger percentage of the effort is in creating useful business applications not portable ones. Using shared infrastructure and other business services enables us to leverage the work of others in ways that have not been possible with the current model. The time to market or time to respond to market demands is shortened considerably as once again we are leveraging the services of others.



This will not happen overnight, nor am I suggesting that we rip out the current installed base of enterprise software. Far from it because the current installed based provides the core set of services required by the corporation and its partners today. The task is rather deciding which services are core to your business and investing in them and using services from others where appropriate. As in all technology revolutions the last generation usually remains a vital part of the infrastructure but not the only part. Software as a service is going to significantly reduce the need for enterprise software but not eliminate it. Software as a service is a more economically responsive model to the needs of IT and enables the corporation to focus more on the business process and the applications rather than the deployment of more enterprise software


Thursday, October 23, 2003

Keeping my Identity

DIDW was very thought provoking and I keep coming up with new issues. The latest being who do I trust to keep my identity? This has both legal and social implications. I see my identity as a cloud of facts that surrounds me and changes in both time and space. The major reason I do not manage my identity is that I do not want to be responsible for it. This is both a data management issue and a legal responsibility issue.


Many people worry about third parties such as banks and the government managing parts of their identity. I would worry more if I had to manage my identity, keep it secure and be legally responsible for it. There is a lot to be said for having multiple third parties being your trusted identity providers.

Monday, October 20, 2003

Personal Identity - who am I....

I spent some time last week at Digital ID World. I was there to be part of a panel on Web Services and Identity that focused on enterprise issues. The other part of the conference was looking at the issues of personal identity and how it related to the digital world. The danger of the identity problem is over-simplification.


The nature of personal identity is really around a collection of facts about me, with axes in time and space. At different times I have different identities, a friend, an employee, a customer, a relation etc. In different places, native, business traveler, tourist, etc. Which facts about me are both necessary and provable to determine my identity to solve the problem at hand.


I am not sure the problem can be solved by either a single device or a static set of assertions. I need to be able to reach into the network and present a set of provable assertions to define my identity and other parties need to be able to ask for a collection of assertions. Once provided the provable assertions they identity consumer needs to be able to state what they are going to do with the information, i.e. how persistent it is and will it be attached to my identity cloud.


The idea of an identity cloud that has clusters of provable facts in time and space is the mental model I am creating around identity as anything else is too simplistic.

Sunday, October 19, 2003

Is persistence becoming a commodity..

Several announcements such as this one Big Blue retools database pricing | CNET News.com are pointing in the direction of databases becoming a commodity. The next step is just making persistence a part of the information flow. One of the major advantages of XML and XML Schema is that they are all readable and can be persisted in a single step from the information flow.


This is a disadvantage for the database vendors as they have profited from the complexity of storage. Now if it becomes a commodity in the cloud, a part of the fabric of the network, then what is the actual value of persistence?

Saturday, October 11, 2003

Is Google a more dangerous monopoly than Microsoft?

As a monopoly Microsoft controls the tools and platform the majority of people use to access the internet. While this has significance in terms of how the browser and the desktop evolves it is not nearly significant as the control of how we discover information. This thought was sparked by the article Google CEO speaks out on future of search where Eric Schmidt talks about the need for personalization in search. and Google's acquisition of Kaltix a personalization technology for search.


Google is becoming the gatekeeper of information for the new millennium. The first place I, and many others go to find something is Google - the power and responsibility this gives Google is enormous. The view I and others now have of the world is to some extent controlled by Google. This control of information dwarfs any monopoly Microsoft has of the desktop. There are two filters Google is starting to put on my information flow - one is advertising and the other is personalization.


The lure and power of advertising dollars has resulted in a media industry that (IMHO) is bland and vapid. The content of television is extremely bland due to the power of the advertisers - lets not offend anyone. Already we are starting to wonder if search results are being ordered by dollars rather than relevance in Google. Can Google resist the power of the advertisers, today yes perhaps as a private company. When they become public the need to deliver quarterly numbers will put tremendous pressure on them will the separation of commerce and search survive?


Personalization of search results also worries me as if I people are only served information that pleases them there is a strong tendency for us to believe that our world view is correct. I look to a search engine to expand my horizons not shrink them into my safe zone. Personalization has a tendency to link us to others that thing the same and have similar tastes. Is there a personalization setting to challenge my thinking?


Google is doing the right thing to generate revenue and compete economically but is this the best thing for society as a whole?. I, and I am sure others treat Google like a library - an impartial source of information, it is now becoming a bookstore that tries to merchandise the information it is offering. What happens when the library disappears and we only have the merchandising?. If Google is the only search engine around we are suddenly in a world where our information access is controlled by a single entity that has at the heart of its economic model advertising dollars. If Google does not serve up the information - does the information exist?


If Google does become an information filter - we need another way to have direct un-filtered access to information - the PBS of search. One interesting project in this area is Nutch founded by Doug Cutting the creator of Lucene. A free society requires free access to information - lets make sure information access stays open.

Wednesday, September 24, 2003

Prevayler - orthogonal Persistence..

I noticed this Slashdot | Prevayler Quietly Reaches 2.0 Alpha, Bye RDBMS? today after my previous post about data impedance. If you cut away the hype there might be something here.


Another similar solution I am playing with just now is db40 which is a Java and .Net object database. It is very elegant and easy to use.

Tuesday, September 23, 2003

Impedance mismatch in Development

In Java is the SUV of programming tools a widely quoted article by Philip Greenspun he raises the issue of complexity. While I favor Java as a programming language I think he raises a few interesting issues.


The problem is not the programming language but rather the infrastructure we have built. To create an application that takes a string as input through a web form and then stores it in persistently a developer must know a wide range of technologies. This is true no matter which particular approach they use. If I use Java it is HTML/JSP/Servlets/Java/JDBC/SQL at a minimum if I use another approach the stack is similar (HTML/PHP/CGI/Perl/DBI/SQL). The number of transformations the simple string goes through from the web page to the database is significant. The impedance mismatch comes from the need to translate amongst datatypes. If it is a real world problem with complex types the problem quickly becomes hard.


Developing applications with clean interfaces and separation of presentation layers, business logic and persistence is moving to the extreme. Some of the work on orthogonal persistence attempted to simplify the problem but it ran into its own set of issues around being too tightly coupled.


The internet is a wonderful application development platform but we are making it too complex to do useful and simple things. While XML has its own set of issues it does reduce the impedence issue if an XML datastore is used. There are still several technologies in play but one data format. (XHTML/XSLT/XML/REST/XML-Store) - are we better off?

Tuesday, September 16, 2003

Inspirational Technology -Cool stuff

A few people have been commenting on Inspirational Technology (Sam Ruby and Jon Udell). The really cool aspect for me is that it is pointing the path to a truly distributed environment where persistence is in the background not the foreground of development.


Much of the work around orthogonal persistence in Java has stalled around synchronization of state between communicating applications. The middle ground that Kimbro takes shows the path forward to build a whole new set of collaborative applications that have persistence in the background.


I feel inspired to try and expand on the work of Kimbro and create some new services that allow data to flow with transparent persistence

Monday, September 15, 2003

Jeremy Zawodny's blog: RSS Auto-Discovery 2.0

Anything that makes it easier to connect blogs is a good thing. Therefore by definition RSS Auto-Discovery 2.0 is a good thing. The value of blogs apart from being fun; is connecting with others either in a machine or human readable fashion. Blogs are a major enhancement to the web in terms of structured relationships.


To keep blogs moving forward and creating more value through we need to keep the right balance between simplicity and power - weighted more to simplicity.

Sunday, August 24, 2003

HTTP Performance

Nice to see some rationality brought to the HTTP debate by Mark Nottingham in HTTP Performance. Now we just need another article on HTTP Reliability - or why the browser back button is not a protocol artifact.

Thursday, August 14, 2003

Software as a service pros and cons

Phil Wainwright at Loosely Coupled alerted me to the fact that my RSS feed has been stalled since the last update of Blogger early July.

This illustrates the pros and cons of software as a service. The cons was that there is a ripple effect - I use Blogger as a service - no client software, no install, available anywhere, and a single update for all users. Therefore a single bug, change in default settings propagates from Blogger to me to Phil who is trying to consume my RSS feed. The same bug would effect everyone who is using Blogger and consuming RSS feeds.



The pros are a single fix and everyone is back in business. No need to test and deploy software for every platform. This is a radical change in economics and speed of distribution that is changing the playing field. If is no longer about the software it is about the service.



Sunday, August 10, 2003

Improved productivity...

Microsoft has the power to increase global corporate efficiency by a single change to Outlook. Change the default meeting times to 20 minutes instead of 30 min. Make the new standard for meetings 20-40-60 bet we could all get the same done in a 20 min meeting as we do in a 30 min meeting the same for 40 versus 60.

Of course this would not stop the creation of more meetings....

Sunday, July 13, 2003

Distributed Computing Economics

Slashdot | Distributed Computing Economics Jim Gray has a great article about the economics of distributed computing Distributed Computing Economics:Jim Gray. It lays out how to effectively quantify the benefits of distributed computing, what problems work in what areas. It clears through a lot of the hype created around Grid Computing and On-Demand and puts them into perspective.

There is also what I would call a companion article at Mailing Disks is Faster than Uploading Data that lays out the economics of data transfer. Taken together these articles lay out a very good case for designing systems that move as little data around as possible and centralizing the storage of data as a service in the network.


With cheap storage we are getting incredibly lazy and creating data everywhere. Our sysadmin regularly sends me e-mail about the bloat in my mailbox. Enterprises have databases everywhere and applications have a tendency to be very database centric.


Most enterprise applications store significantly more information than they actual need. The major impact of this is a data synchronization problem. There are many companies building and selling solutions to synchronize this data, but they are solving the wrong problem. We are designing applications wrong, mainly because we do not realize the impact of have data everywhere - after all storage is cheap. The real issue though is managing and synchronizing it is very expensive. Solutions like SForce are delivering storage as a service. This forces people to think in new terms, as storage is essentially free what they are charging for is management. The economics of scale they can bring to managing data reliably are enormous. Over time, I assume they will also provide the visibility and management tools that allow you to understand the flow of your data as it is now becoming part of your services network.


The bloat in my in-box is mainly due to office documents. The reason I get the original not the link is because the sender wants to send me a snapshot i.e. the document at that point in time. As the file system has no versioning (where is VMS when we need it) therefore they do not send a link they send the actual document. One company that has a really cool approach to this problem is Its the Content they have the potential to change how information workers interact and significantly reduce the content in the workspace. [Disclaimer: I have an advisory relationship with ITC.]


Just because it is cheap to build distributed systems and share information does not change the need to consider the fundamental economics and complexity they introduce. The companies that understand this will be able to take advantage of the services network the others will become mired in complexity



Monday, June 02, 2003

Patterns of persistence

Jon Udell: Patterns of persistence takes a bold stance on separating persistence in J2EE from relational databases. I agree with his points and I now have to go play with JBoss 4.0 to see what they have cooked up. For anyone interested in a simple but powerful object database (similar to ObjectStore but no post processing) I would recommend Db4o. I have not used it in any production applications but I am looking at using for my blog visualization tool.


Several years ago Sun had a project to put orthogonal persistence into Java - I am not sure what happened to it but seeing what can be done with Db4o I wish they had continued.


However for most enterprises the technique for getting persistence into the application is not the major issue it is the integrity of the backup and restore solution. This may be getting easier with RAID drives but it is still an issue that is key to any robust persistence solution.



PS Jon, when are you going to enable comments on your blog?

Tuesday, May 20, 2003

Sharing is important....

Over the weekend my laptop died - complete failure of the hard disk. Being a professional I had of course backed up everything NOT. What saved me was sharing - I had shared everything I was working on with other people for feedback, comment, etc. By noon Monday I had all my important work back.

This is not a technique I would ever advise to anyone, but it does illustrate the concept of a simple backup strategy through statistical copy. It does not require sophisticated hardware - just lots of cheap hardware. The MTBF for most computer hardware is probably measured in years now and so for most non-critical data this is probably not a bad technique.


Could this be done with cron,bash and Samba? - seems pretty simple to just keep a list of directories to copy around the network and ensure that all key data was statistically guaranteed to be more than one place. A project to work on in the mythical spare time.

Thursday, May 15, 2003

Time for a mobile service approach?

Jon Udell writes in Indexing and searching Outlook email
As I learned this morning, my closing lament -- that the CPython/MAPI and Jython/Lucene halves of this project do not communicate directly -- is somewhat mitigated by the existence of Lupy (1, 2), a Python port of Lucene. But I think the general point still stands. Must every component be rewritten in every language? Let's not go there.

No lets not go there - lets go to a service based approach - if Jon makes this into a service we could all use there is no need for anyone to have to decide among Python and Java.

Nice idea but the issue comes down to the data and how to move it around if I need to send my message store to a service every time I want to do a search the performance will be awful.

Keeping the constraint of I want to use a service and I want good performance - so lets move the mountain. Can the service become mobile and move closer to my application?

Building distributed applications requires balancing the amount of data moved, computation efficiency and granularity of the computational unit. Making common web services mobile and putting intelligence into the network for optimization of distributed applications may solve the problem


Wednesday, May 07, 2003

Time to dump Blogger?

Just saw the new graphic for FM Radio it looks really cool FM Radio with enhanced SocialDynamX.

It did trigger a reminder to have a little rant at the Blogger/Google merger - I just read the release notes for Dano and I am not impressed - time to move over to Moveable Type. Hopefully FM Radio will work with that soon.

Friday, May 02, 2003

Social Software Hype or Revolution

Phil Windley discusses the hard question around social software - What is in a Name: Social Software. I think he has it right, while we always get caught up in hype in our industry - the hype usually has some essence. Dave Winer argues this Pig Wont Fly by comparing social software to P2P. Well now that the hype is dying down around P2P there are real applications/businesses being developed around this technology e.g., Blue Falcon. The glamour has gone and it is now down to hard engineering and business building. The same has been true of many hyped technologies.

Social Software should not be measured by the hype but rather what are the fundamental concepts that it creates for moving us forward as an industry. While all our great hype cycles (anyone remember the AI hype...) have generated a lot of wreckage they have all contributed significantly to our knowledge.The wreckage has been mainly create by unwise VC's and pundits following the herd - not by technologists trying to create something new and valuable.


I am not sure Social Software will contribute as much as P2P, OOP, AI or Web Services but we need to hold new ideas lightly and encourage them to grow and develop. If not we will as an industry will cease to be relevant.


Thursday, May 01, 2003

Blogging: the search for knowledge synthesis

I was asked why I blog ( or more specifically why I enjoy it) recently. At first I had a fairly lame answer. After more consideration I realize I am searching for knowledge/information - a part of that search is expressing ideas publically and getting feedback. My life is enriched by learning from others but that requires sharing of thoughts - is this the essence of blogging?

At one the recent Social Software meetings I asked what the end goal for social software for users would be. I am not sure I got a good answer, but then I do not believe there is one good answer. On reflection the question may be how does "Social Software" provide an environment (and the tools for Marc) where users can find their own reason to interact and grow in different ways.


Should the goal of Social Software to provide applications or infrastructure?

Sunday, April 13, 2003

Topics, Metadata, and Tools

It was great to see the publication of Easy News Topics it adds a new set of Metadata to address the tools issue brought up by Marc Canter. Like all good specifications it is simple and designed by only a few people and fits into other work (i.e. RSS).

The major area to address though is how to create enough clouds we all agree on and how to suggest automatically topics to be added to stories. Having worked in content management (and been an user of many systems) it is to easy to just blow by the content categorization screen.


Having publicly available clouds of taxonomies makes the first step possible - we all have a common framework and hierarchy of concepts. The second need is to map these to published articles - sounds a great service for someone to launch Ross? or Matt and Paolo ?.

Public clouds are a key step to making this work as trying to get anyone set of people to agree on an ontology is the second hardest problem in knowledge management - the hardest problem is getting them to actually using the ontology when creating content. Interesting enough I would presume that bloggers would be more likely to perform these manual tasks. Bloggers are trying to share information and create "clouds" of understanding and shared context. Therefore they have the necessary motivation and reward.


When this is extended to the general workplace we have deadlines and are usually only concerned (motivated) with communicating to a primary audience. The objective of KM is usually to serve the secondary and tertiary audience and hence develop information reuse. At this point the need for shared clouds and automated tools becomes key.


Easy News Topics is a great step forward the next few steps are going to be creating the shared clouds of topic maps and the tools to summarize content. Once we have these basics a new set of KM applications will arise - cannot wait ;-).

Sunday, March 30, 2003

Identity - Spheres of Influence

Jamie Lewis writes in Ends and Means: Identity in Two Worlds about the mis-understandings in the identity space. Trust, Identity and Policy are something I have given some thought to Creating trusted value chains. One reason for the confusion is that there are spheres of influence for identity. These spheres are in many ways different, I would characterize them as inside, outside and among. Inside is within a corporation and the notion of identity is controlled by the corporate directory(s). Outside are your customers coming into the enterprise and they have the individual identities that are not within the enterprise and also their identity is typically not attached to their enterprise. Among is when two organizations have a relationship the relationship may be managed by users with an identity within their organization but the basis of the relationship is among the two organizations.

The fundamental differentiation is the legal relationships, inside it is the employment contract we all have with our employer, outside is among us and the organization providing the service, and among it is the organizations involved.

Each of these spheres of legal control requires a different approach both technologically and socially. Unless we recognize them as different problems we will just have the circular conversations and talk at one another, rather than at the problem.

Monday, March 24, 2003

Mapping relationships - the next step in knowledge management?

Knowledge management has been described in many ways from the next big thing to an oxymoron. Much of the past work in knowledge management has focused on the information items rather than the relationships between the creators and consumers of the information. The first major application of the relationships among consumers and producers was Google. A simplistic explanation of the technology is that Google weights rankings by the relationships betweem people who produce and consume information.

There have been a flurry of efforts to visualize relationships among people to better understand how information flows, some even visualize Google using its web api Google Browser. There are several other efforts underway to visualize relationships by examining e-mail databases and looking at the to/from/cc relationships. One made it to C|Net E-mail patterns map corporate structure | CNET News.com another discussed by Ross Mayfield is an open source project Apache Agora by Stefano mazzocchi. I have been working on my own visualization of RSS links between blogs Blog Mapping which is making very slow progress.

Several companies (informal groups) are starting to appear around this space, RossMayfield's SocialText, Semaview looking at relationships using FOAF, and Groxis. I feel that this is a sign the next wave is coming and its foundation is in the move from unstructured to structured information. Blogs are one of the key platforms for moving from unstructured to structured information. They are providing a rich ecosystem to experiment with various structured data and metadata technologies. How long will it be before blogs become part of all the major platforms?

The key change that structured data/metadata and hence relationship mapping brings to knowledge management is that it makes it easier to connect the loosely coupled relationships between producers and consumers. I believe the value of information (and hence knowledge) has a very direct trust relationship between who produces it and who consumes it. Blogs magnify this trust effect - if a trusted node publishes a piece and it is then picked up by several other blogs and discussed the "knowledge trust metric" is increased. In a recent posting by Jon Udell Degrees of Freedom he quotes Sam Ruby "Its just data" - yes but the relationships the data has enhances its value (either making it more or less true).

For our understanding of each other and the world we need to consider the relationships between information as this is the context where we build understanding and trust between each other. Visualizing the flow of information in both time and space is necessary to achieve both understanding and trust. Without these missing elements we will not make much progress.

Anything that helps to add understanding and builds trust in our strife torn world cannot be a bad thing...

Wednesday, March 12, 2003

Knowledge Networks..

We frequently look for use cases when studying communities of knowledge workers. The largest and most successful (IMHO) knowledge network today is the open source community. One of the few major companies to tap into it in a constructive manner is IBM, the rest have different degrees of antipathy towards the community. The tools provided by sites such as Freshmeat provide the framework for very effective collaboration and information sharing.

The knowledge captured and shared by the open source community is probably one of the most dynamic and dense public bodies of knowledge available today. The concept of open source and blogs are very similar around the open sharing of ideas and allowing the network (i.e. empowered individuals) to select and propagate useful code or posts.

The open source community is probably the largest R&D laboratory in existence today. It is a classic example of simple tools for collaboration providing more value than all the complex tools combined have ever produced. While companies such as RedHat have tapped into this and created successful businesses they have only tapped into a small fraction of the value that the network has produced. Is there an organizing principle here where value can be created. By value we do not necessarily mean financial value but rather new applications that will benefit the community as a whole. Applications that move beyond the creation of Linux and GNU to the mass of users of computers.

Monday, March 10, 2003

Simple tools - Just the facts

Effective Social Networks on Ross Mayfields blog brings up the point of the complexity of groupware tools - I would add to this the whole field of knowledge management as being guilty of the same sin. There is a self perpetuating myth that there needs to be a complex set of software to manage large amounts of unstructured information. The myth appears to be that this information is hugely valuable and needs sophisticated tools to ensure you can squeeze the last drop of value out of it.

One of my professors taught his class a key rule in any engineering project - always do a rough estimate for any calculation/project to get an idea of the size before doing a detailed analysis. This serves several purposes - makes sure you understand the variables in the problem and makes sure you do not misplace any zeros in the detailed analysis. Doing the same in any groupware or knowledge management application is a similarly revealing exercise - take a representative sample of information and reduce it to the key facts - it quickly shows there is not much there and what is there does not benefit from complex tools.

So what is all this unstructured data - mainly attempts to create the evidence to prove the few facts that we are all trying to reach agreement on. Perhaps we should be looking for tools to assemble evidence in favor of the few key truths that we all hope exist.

Friday, March 07, 2003

Thanks to Jon...

Thanks to Jon Udell for pointing out my graph project. Unfortunately I do not think it was ready for the attention. I have some ideas on how to make it more useable and a lot more robust.

He does bring up some interesting points - similar issues have been discussed on Ross Mayfield's Social Networks. Where does technology help form communities and when does it get in the way. Jon's take on WebX is that it was getting in the way, of creating a community. Blogs are much easier but need some way of providing feedback more like threaded discussions.

To create social networks that can exerts political pressure it is necessary to bind them together to exert the collective will. This binding may be very transient but to impact any political process it must exist.

Sunday, March 02, 2003

Interactive version with deep crawling

Working under the principle that it is better to release than not I put up another version of the blog map thing (need to think of a name). It has the two types of crawling 1) Shallow takes the node you select and finds all the rss/rdf feeds listed on the page - quick and painless. 2) Deep: gets all the links on the page and follows them down to see if they have a RSS feed that matches the link. If so they are a blog and so save. - this is slow.... Have patience I hope you will be rewarded with pretty pictures. There are still many bugs but I am having fun playing with it so I thought I would share the fun.Graph Toool

Friday, February 28, 2003

New graph tool

I have had a few cycles on planes etc to update the blog mapping tool Blog Map. It is now more interactive - youy can crawl links that have linking to RSS or RDF feeds. The nodes are color coded for the type of RSS feed that each node has. I still need to add a help button and a bunch of other features to get anywhere near useful. It will update the xml representation as you crawl new nodes. The xml file is as Map Data

As always comments welcome.

Monday, February 17, 2003

It is about structured data

Google has announced that it is buying Pyra Labs providers of Blogger (used by this site). Already this has been covered all over the BlogSphere.

One major benefit that Google gets promoting and supporting Blogs (apart from a new revenue stream) is more structured data. Structured data as RSS, LinkBacks,FOAF are raw meat to the carnivorous beast that is Google search engine. Flat web pages are interesting but what you can extract from Blogs is much more interesting.

In the past (cira 1995) I wrote software to scan the web and try to extract meaning - very hard and painful. In a few weeks I was able to do so much more with mapping than was every possible just because of RSS. Just thing what the smart guys at Google are going to do.

It will be some time before the web is completely structured but Google's purchase of Pyra is more about structuring the content and providing more insight.

Friday, February 14, 2003

Updating Mapping Project

The next step of the mapping project is making the data maor available and making the graphing tool more interactive. The data is now available as an xml file (I am still working on the schema so it may change. For those who are interested here is the xml for the current map Blogmap. I hope to update the graphing tool soon to allow users to explore on their own. As always comments/suggestions welcome.

Sunday, January 26, 2003

Software as a service

This is an idea that is ready to happen. Jon Udell: Publish globally, script locally talks about it and in several other postings has demonstarted it. Like any other major shift ( I will avoid the p*** word) this will require a mojor shift in thinking. As part of my blog mapping project I am going to commit to both using other services and provide it as a service. If you believe in a revolution you need to stand up and be counted. Open source is just the first step - open services is where the revolution is !

Saturday, January 25, 2003

Dynamic Graphs

If you click on the image before it will bring up an applet with a map of my local blog space. Navigate mode allows you to set a blog (by clicking on it) as the center, then you can shrink and expand the map. Going to edit mode makes the nodes clickable and enables the ability to click the various blogs.

Feedback is welcome - I have a few ideas about how to make it more interesting and relevant.

Monday, January 20, 2003

Fun with maps

after my last exploration with mapping blog relationships I moved over to using a Java based library Touchgraph that allowed a more dynamic look at the relationships between the blogs I link to and those that link to them. If anyone has read the book "The Tipping Point" there is a discussion about the various types of people that cause the formation of social groups. I think mapping how blogs relate to one another can illustrate some of the points the author (Malcolm Gladwell) makes. I need to re-read the blog and probably re-read linked to before I can draw any conclusions. I will admit to being fascinated with the entire investigation.

Here is my most recent map of the links around my blog. It is a static image generated from a Java Application. If I get sometime in the next few weeks I will turn it into an applet that will allow users to browse and manipulate the map. Each node expands into the next level down. I have not gone beyond that - need to do some more work on my crawler to increase speed and reduce memory consumption.

Tuesday, January 14, 2003

Scarily Cool

Jon Udell demonstrates in O'Reilly Network: Services and Links [Jan. 13, 2003] a scarily cool REST web service. The idea of a web service this cool and easy is humbling. It even works (kinda) even with Blogger. Being able to create a useful service without the plumbing necessary in SOAP is somewhat disturbing but something that is also strangely liberating.


Monday, January 13, 2003

Trust and networks

Ross Mayfield's Weblog writes about Cognitive or Emotive Trust. Trust is something that is often overlooked when security issues come up.

Saturday, January 11, 2003

Open Source and Blogs

One of the biggest learnings from the little mapping experiment I am working on is the power of community. To build the application I relied heavily on the hard work of others in the Open Source Community (many thanks to the teams involved with: HTMLParser, Graphviz, and Xerces).
Being able to share ideas with others in the blogging community such as Steven Dulaney and Ross Mayfield gives real-time feedback that then sparks new ideas. Even though between pressures of work and family I am a little pressed for time I still feel part of a larger community through blogging. Thanks all.

Thursday, January 09, 2003

Navigation Links

Stephen Dulaney e-mailed me that he liked the navigation map on my blog. Well it is the season, so Stephen here is your own navigation map [feel free to cut and paste].

istori/logBlogging AlonePython Community Server: Development
a klog apart
Scripting News
Instapundit.com
Ron Lusk's Radio Weblog
Second p0st
Seb's Open Research
Marcus' Tablet PC Radio Weblog
Ross Mayfield's Weblog
evhead
The Shifted Librarian
Jon's Radio
thomas n. burg | randgänge
Universal Rule
Jon Schull's Weblog
null
Ross Mayfield: Social Networks
Ray Ozzie's Weblog
John Robb's Radio Weblog
RatcliffeBlog: Business, Technology & Investing
Peter Drayton's Radio Weblog
A Man with a Ph.D. - Richard Gayle's WeblogMarc's Voice
Boing Boing Blog
Steve Gillmor's Radio Weblogkottke.orgStephen RapleyFast TakesSam Ruby   Hugh's ramblingsJeroen Bekkers' Groove WeblogJohn BurkhardtJeremy Allaire's Radio Robb Beal's Radio Weblog

This is just a small example of how make use of the visualization of blog links. While creating nice graphics is fun there is the question to be asked - what is the use of visualization?

Jon Schull makes the point that the number of nodes quickly makes the problem very hard to manage. Just going a few levels deep from my home page I get the following image map that has 250+ nodes. It is clickable and it is fun to navigate around and get to other blogs in one click rather than navigation through several other links.

The value of mapping may not be in mapping the raw connections but lifting patterns out of the maps. Sounds like another project...

Wednesday, January 08, 2003

Changes to RSS to make mapping easier

As part of my mapping experiment I had to parse through HTML and determine by heuristics which link was the RSS feed for the particular URL etc. There were two problems with this:

  • Determining if the link was a RSS file: naming conventions vary widely, any chance we could all agree to use .rss as the extension?

  • Extend the RSS schema to include referrals as part the schema. This would allow very fast mapping of blog relationships.


These two changes would allow blogs to be machine mappable without the current crufty coding required to walk through HTML files. This should lead to lots of unintended consequences....

Mapping my Neighborhood

For a while I have been interested in different ways to look at information and how people interact to create new ideas. After talking about it for a while I decided it was time to actually do something about it. So over the holidays I hacked together some code to map the surrounding blog space from a starting URL. It is still a little buggy i.e. I get unexpected results sometime.

The code parses a starting URL builds a map matching links to URL feeds using several heuristics. The rational is that if you have an RSS feed you are likely a blogger therefore you are included in the map. This does exclude some notable sites and may include some non-bloggers but it gives a pretty good initial picture.

This is the first image map of my local neighborhood to a depth of three links removed.
Now that I have all this information at my finger tips the interesting thing is what can I do with it and what other relationships can I start to determine and visualize.