Posts Tagged ‘linked data’

Crowdsourcing Transparency of Information not just Data: URI.gov-Permalinks to improve the quality of government data and a reuse of GO.USA.GOV.

March 20th, 2010

Is there a low cost shortcut to make government data more connectable without having to tackle the barriers of a full linked data or rdf approach?

In order to make government transparent, we must make government data sets able to connect to each other.  One example of data which needs to be connected is  connecting policies to procedures to efforts in government. This is the DNA of government and this information is needed to truly innovate government.  But there are many other examples of government datasets both kept internally and released publically which would benefit from connectivity standards.  For instance just being able to find data which applies to a specific office in an agency or a specific program.

Problem: How do you increase the connectivity of government data so it can evolve into government information?

And is there a low barrier and low cost path to get there?

Data becomes useful and can start to spur innovation when it is connectable to other data.  One way to make information more connective is to have field definitions called XML tags which indicate the type of information in that field.  So if we have a tag called <agency> and that field contains names of government agencies, then you should be able to connect that information with other data sets about government agencies.

But there’s a problem with having common field names (data standards)!

Unfortunately even though everyone may be agreed on the name of the field or have a data standard, they may not agree on how to refer to specific agencies.  In short it doesn’t have a connectivity standard. So data in these fields might different between different data sets.  It may read “EPA” or “Environmental Protection Agency” or the “US Environmental Protection Agency.”   So even if 2 data sets shares the same XML language, it may not be connectable easily as a computer doesnt necessary know that EPA=Environmental Protection Agency= US Environmental Protection Agency.

A solution borrowed from Linked Data or RDF?

So what we need is a consistent way to name things in the world.  Linked or Semantic Data has a concept called a Universal Resource Identifier or URI.  A URI is simple a URL or web page address which is permanent and serves as the definition for an object or concept.  This is similar to the concept of having a primary key in a database, but in this case the database is the internet.

How do we get started?

I know what you are thinking.  This is another major IT project for which we don’t have funding.  Or this means that web managers will have to rename pages and then keep then permanently which would be a huge problem.

It’s easy. Make a it a redirect!

To avoid the problem of adding to the complexities of maintaining government websites, we would use a redirect service to serve as the repository of the URLs.  In RDF parlance, these are called  PURLs (Persistent Uniform Resource Locators).   So that there would only be one database of the URIs but that would not have to contain the definition pages, it would simply redirect to a web address.  So if a new content management system is installed, then we simply update the central URI registry.

Gee, if only we had a low cost redirecting service which could handle high volume!

Well we do. It’s called Go.USA.Gov.  The purpose of this redirecting service is to offer a URL shortening service for federal agencies. This is needed and a great idea. I propose we give this service a dual use. We add a subdirectory “/uri” and allow people to choose a URL name instead of it always being shortened.  This is similar to how the commercial service tinyurl.com works.

Or for slightly for effort…

We could copy the existing mode, make the custom name modification and change the name to URI.gov. This won’t require any additional license purchase since the code is open source and most likely could still be placed on the same server and use the same resources as Go.USA.gov.

But who would make up all of the URIs?

This is where the crowdsourcing aspect comes in. You simple setup the service and standards for how to do this and you evangelize the potential for greater transparency to web managers and the CIOs.

But it won’t be perfect!

Yes it won’t. As is the case with other crowdsourcing efforts. But it is way to solve 90% of the problem for .1% of the effort. Once it gets going and the value of the URIs becomes clear, then it can be officially be incorporated into federal procedures by the CIO council or web managers council.  We can start to use service then to connect Policy to Procedure to Effort as well as make connections between other useful datasets about government. This not have the barriers of taxonomy and technology which a true linked data strategy has, but provides some of the benefits of the linked data model.

So cheap, quick and potentially very useful.   Why not?

Note to Linked Data guys: Guys this is not to say we should not move toward rdf or rdfa but instead is a way to show the power of one of the benefits of it.

Linking Benefits to Federal Spending to drive Government Innovation.

August 4th, 2009

Opportunity: Spending of government  money should have a purpose and that purpose should be for the benefit of someone whether directly or indirectly.  The benefit might for an employee to work better and that employee might be working to benefit a group of citizens. The administration wishes to create a more transparent, effective and innovative government as well as to reduce the federal deficit. In order to do this, the administration must identify opportunities for innovation which can increase efficiency as well as decrease spending and make the case to the American people that it is making more effective use of taxpayer funds.  I want to make the case here that linking spending data to benefits of that spending in ways which are detailed,  clear and relevant to large numbers of citizens is the best way to find innovations to create a more effective government as well as to make transparency have meaning and value for the average citizen.

Challenge:

  • Linking Spending to Benefits: Federal spending is reported in ways which do not clearly connect it to the benefits that specific expenditures provide.  While certain dollar amounts may be reported as going toward ‘Defense’ that is not specific enough to understand whether a given expenditure is justifiable and doesn’t allow an expenditure of group expenditures to be compared to alternative solutions for the same specific benefit oriented goal.  Therefore we must find ways to better connect specific spending to specific benefits. » Read more: Linking Benefits to Federal Spending to drive Government Innovation.

Policy, Procedure and Effort as Data: Mapping Government as the first Step to Reinventing it

March 22nd, 2009

This post is in beta. I am looking for help in better understanding the connection between policy and effort so we can discuss it at the upcoming Gov 2.0 camp.  I am not an expert by any means in this area, but am struggling to understand the problem from a data perspective.  The semantic web initiatives and in general the goal of a collaborative government drove me to seek this understanding of how policy is connected to effort.

One of the 3 things which the NAPA paper on Enabling Collaboration:  Three Priorities for The New Administration identifies as a barrier to a more collaborative government is  ‘An inability to relate to information, and information to decision making.’   This hints at a critical problem in creating new initiatives which is not having enough information to plan a path to implement a new initiative.  I believe the solution is to map the connections between policy, responsibility, effort and procedure as critical pieces of data to inform decision making.  This has the potential to speed  progress in creating a more agile, innovative and collaborative government  just as mapping the genome has sped progress in genetics.

Specifically, I see missing connections between policy, responsibility, procedure and effort required to create new initiatives.   Let’s call it PREP (Policy-Responsibility-Effort-Procedure) data since everyone loves an acronym.   So PREP is essentially a line connecting 4 points from policy to the person trying to create a new initiative.  From a policy at a high level, to offices which have responsibility to ensure the policy is followed, to procedures created by those offices and to the effort to follow those procedures.  (I am sure in reality its more complicated than that but lets keep it simple for argument’s sake).  Of course each initiative has multiple policies it must be compliant with, so multiple lines between the effort and policies.  The procedures are often interdependent, yet created independently by separate offices often in isolation from what other offices do.  In the end you have a thick mesh to get through, that needs to be rediscovered for each new initiative.   I  have come to the conclusion that mapping PREP data is critical to creating a more collaborative, agile and innovative government.

The problem starts with policy being handled as a 19th century invention, that is as an isolated document.   Then the document is passed to various departments with responsibility to make sure the policy is followed.  These departments create procedures to ensure the policy is followed. When someone wants to create a new initiative or project, they need to determine all procedures from all policies involved and then put in an effort to follow  these often disjointed procedures which often have hidden interdependencies.  This seems to be a primary cause of what we commonly call the ‘bureaucracy’.

Many well intended policies come together to produce unintended, entangled procedures which form a barrier to quickly creating new initiatives.  Essentially this is a emergent property of the many policies which have been implemented over the years, as well as the many offices created to follow the many policies.  The result is fewer or slowed new initiatives leading to less innovation and collaboration. (Since almost by definition collaborative efforts will involve new initiatives.)  A confounding problem is that new technology is causing procedures to have to be reconsidered and policies reinterpreted which adds to the complexity.

There is data on policy to effort connections but it does not seem to be centrally accessible or uniformly stored.  And the large differences between  interpretation by individuals on every node in the PREP data which can change for every decision confound the problem of understanding what is really happening.

New policies to instigate new initiatives are now being issued and fast results are expected, but because of this unseen mesh which holds up execution, the top levels are frustrated with the work not getting done.   Meanwhile the people in agencies feel that they are too constrained to get the new initiative started.  Since the mesh is invisible, solutions to change the system become confusing and difficult to follow because they normally add to the mesh rather than disentangle and streamline it.

Solution: Map the PREP (Policy-Responsibility-Effort-Procedure) data and use this map to create guidance on streamlining implementation of policies as well as identifying duplicate or unnecessary procedures.  The data should include the amount of ongoing or one time man hours involved in the effort to follow a procedure, the average calendar delay caused by a procedure, and any interdependency with other procedures.

How? Initially just collect the data in a standardized and centrally accessible format.   It will be almost immediately useful. Use collaborative techniques to collect a lot of data quickly even if it means lower quality data initially. Then gradually move the data to a Semantic/RDF storage system where it can be queried in many different ways and linked to the broader set of definitions such as law, case history etc.

This will be the start of making a more agile, collaborative and data centric government.

The Challenge to this approach: Besides data management which is not too bad initially.   A lot of these hidden paths are not 100% ok with 100%  of interpretations of policies,  so how do we create a collaborative environment without people worrying that the interpretations which allow them to get things done will be ruled to be incorrect?  It seems this needs to be a research project that can’t be looked for that purpose.