Friday, January 31, 2014

Why Guest Posting Has Gotten A Bad Rap

As a proprietor of a small, successful niche blog, I can easily share some insight into why Google recently started punishing blogs that have guest posts.

At API Evangelist I get about two offers a week from random companies and individuals asking to guest post on my blog. These people cite several reasons for wanting to do it, ranging from me helping them as an aspiring blogger, to them helping me with more content and traffic. If you know me, you know I don't have a problem producing content, and I do not blog because I give a shit about pageviews.

In addition to these smaller, much frequent requests for guest posting. I also get the occasional bigger company looking to “partner” with me, when in reality they have no desire to partner and generate value for my readers, or move my research forward. These conversations start out entertaining my perspective of partnering and bringing me value, but once I choose to dance with these partners, they almost always start getting heavy handed about me publishing what they want, and providing links to their own sites and content.

My friend Mike Schinkel has a great post on this very topic, echoing much of what I’m saying. Mike is like me, he blogs for himself, not to generate pageviews. I started API Evangelist to help me better understand the API space, and while I do have a mission of also educating the masses about APIs, the primary directive is still about educating myself—without this none of it matters.

The reason Google has begun cracking down is because it is in their best interest to ensure blogs are of the highest quality, unique content possible. What these “guest post farmers”, and the enterprise companies that employ this same practice don't realize, is their aren’t generating any value, they are extracting and diminishing the value of these blogs, and this is what has catching Google’s attention.

On second thought, maybe these companies realize what they are doing. They are just leeches, looking to extract value for their purpose—at all costs. They don't care about your blog, content or career. They want to suck every last page view from your blogs soul, transferring any value you may have had to their operations.


Thursday, January 23, 2014

It Is A Start: IRS Enables Americans To Download Tax Transcripts

I recently talked about how an IRS API would be the holy grail of APIs, and after reading IRS enables Americans to download their tax transcripts over the Internet, by Alex Howard (@digiphile), I’m getting excited that we might be getting closer.

As Alex reports, at a White House Datapalooza last week, the IRS announced the new access:

“I am very excited to announce that the IRS has just launched, this week, a transcript application which will give taxpayers the ability to view, print, and download tax transcripts,” said Katherine Sydor, a policy advisor in the Office of Consumer Policy of the Treasury, “making it easier for student borrowers to access tax records he or she might need to submit loan applications or grant applications.”

The topic came front and center for me as I was working on the FAFSA API, and I realized how critical parents tax transcripts are to the student aid process. I started considering how important taxes are in not just getting student loans and grants, but home mortgage, insurance and pretty much every other aspect of life.

I’m not going to hold my breathe for an IRS API in 2014, but this latest offering shows they are on track to making our tax information available to us over the Internet. The IRS should be able to achieve a modern API ecosystem, as they already have a working model around the IRS e-file platform, but having gained a better understanding of how government works this year, I know it won't be as easy as it may seem.


Monday, January 20, 2014

With Open Data There Is No Way To Submit Back

I have been doing some work on a project to develop an API for the Free Application for Federal Student Aid (FAFSA) form. After getting the core API designed and published, I wanted to provide several supporting APIs that would help developers be more successful when building their apps on the API.

One example is a list of colleges and universities in the United States. This dataset is available at, under the education section. Except for it is just a tab separated text file, which is machine readable, but a little more work for me than if it was available in JSON.

The txt file of all schools was 25.7 MB and contained all elementary as well as secondary schools. For this project i'm just interested in secondary schools, but I need to process the whole file to get at what I needed.

I imported the file into MySQL. Next I was able to filter by type of school, and get the resulting data set I was looking for, with a couple hours of work.

Now I have two JSON files, one for elementary and one for secondary schools. The whole FAFSA project is a working example of what can be done with government data, outside of the government, but I wanted to highlight the number of hours put into pulling, cleaning up the school data. The federal government most likely does not have the capacity to accept this work back from me, forcing it to remain external.

I would love to see a way to link up the original list of public elementary and secondary schools with this JSON data set I've produced, so that we can take advantage of the cycles I've spent evolving this data. I'm sure there are other individuals and companies like me who have cleaned up data, and would be happy to submit it back--there is just now way to do it.

This is why there has to be a sort of DMZ for the public and private sector to interact, allowing the federal government to take advantage of work being done by the numerous folks like me who are working to improve government and build applications using government generated open data.


Sunday, January 19, 2014

Why You Are Missing All The Signals

I get a lot of requests from individuals and companies to partner with them to work on projects. Partnering can range from advising a startup, working on in-kind or paid projects or just having a conversation around a specific topic. Each one of these engagements, I guess you could consider is a sort of interview, for lack of a better description.

Many of these requests never get past email or phone call, but some move forward quickly. I recently had a company who was looking to partner with me on research in a specific area. I got on the phone with the company after brief email exchange, and the conversation started with someone from this company saying, “I haven’t really looked up much about you, but wanted to talk and learn more.” Immediately the conversation took an interview tone, went on for about 15 minutes and ended. Shortly afterwards I got an email requesting 2 references the company could use.

Now, I don’t have a problem with interviews or providing work references, I’m capable of delivering on both. What I have a problem with is not conducting your due diligence, before getting on a call, and using the legacy interview process as a crutch for your lack of desire to understand who someone is. If you are going to partner with someone, get to know them. Period.

If you don’t Google my name, you are missing out. I’m pretty accessible. You can look at my Blogs, Twitter and Github, and get a pretty good understanding of who I am. I’m working on over 50+ projects, engaging in active conversations on Twitter, and actively pushing stories and code to my blogs, using Github. I understand that not everyone is like this in their personal and professional lives, but you should be aware enough to look, and be willing to do 15 minutes of Googling—the minimum viable due diligence these days.

In short, if you are missing all the signals I’m putting out daily, we probably aren’t a good partnership. I’ll decline your request politely, and move on. There is nothing wrong with that, it happens in life all the time. The whole process acts as a great filtration process for me, I just wanted to share with you, so that you can understand. Its not me, its you, or maybe its both of us.


Saturday, January 18, 2014

Adopt A Federal Government Dataset

When I pulled the over 5,000 datasets from 22 federal agencies after the implementation of OMB Memorandum M-13-13 Open Data Policy-Managing Information as an Asset, I was reminded how much work there is still to do around opening up government data. Overall I gave the efforts a C grade, because it seemed like agencies just rounded up a bunch of data laying around, published to meet the deadline, without much regard for the actual use or consumption of the data.

Even with these thoughts, I know how hard it was to round up this 5,000 datasets in government, and because of that I can't get these datasets out of my mind. I want to go through all of them, cleanup and share them back with each agency. Obvioiusly I can't do that, but what is the next best thing? As I ws walking the other day, I thought it would be a good to have a sort of adoption tool, so that anyone can step up and adopt a federal agency's dataset and improve it.

AS with most of my projects, it is hard to get them out of my head until I get at least a proof of concept up and running. So I developed Federal Agecny Dataset Adoption, and published it to Github. I published a JSON listing of the 22 federal agencies, and the data.json file that each agency published. Next I created a simple UI to browse the agencies, datasets, view details, distributions with the ability to "adopt" a dataset. 

When you choose to adopt a federal agency dataset, the site authenticates with your Github account using oAuth, then creates a repository to house any work that will occur. Each dataset you adopt gets its own branch within the repository, a README file, and a copy of the datasets entry from it's agency's data.json file. 

I would copy actual datasets to the repo, but many of the references are just HTML or ASP pages, and you have to manually look up the data. Each repo is meant to be a workspace, and users who adopt datasets can just update the data.json to point to any new data distributions that are generated. I will programmatically pull these updates and register with the master site on a regular basis.

The system is meant to help me track which datasets I'm working on, and if other people want to get involved, awesome! I envision it acting as a sort of distributed directory, in which agencies, and consumers of agency's data, can find alternate versions of federal government data. Additionally, data obsessed folks like me can clean up data, and contribute back to the lifecycle in a federated way, using our own Github accounts combined with a centralized registry.

As with my other side projects, who knows where this will go. I'd like to get some other folks involved, and maybe get the attention of agencies, then drum up with some funding so I can put some more cycles into it. If you are interested, you can get involved via the Federal Government Dataset Adoption Github repository