Saturday, January 31, 2015

My Smart Little (Big) Brother And Programmatically making Sense Of PDFs

I was in Portland, Oregon a couple of weeks ago, and one of the things I do when I visit PDX is drink beer with my little (big) brother Michael (@m_thelander). He is a programmer in Portland, working diligently away at Rentrak. Unlike myself, Michael is a classically trained programmer, and someone you want as your employee. ;-) He’s a rock solid guy.

Anyhoo. Michael and I were drinking beer in downtownt Portland, and talking about a project he had worked on during an internal hackathon at Retrak. I won’t give away the details, as I didn’t ask him if I could write this. :-) The project involved the programmatic analysis of thousand of PDFs, so I asked him what tools he was using to work with PDFs?

He said they were stumbling on the differences between the formatting of each PDF, and couldn’t get consistent results, so they decided to just save each page as an image, and used the tesseract open source OCR engine to read each image. Doing this essentially flattened the differences between PDF types, giving him additional details provided when you use tesseract.

It may not seem like much, but ultimately it is a very interesting approach, and as I continue doing big data projects around things like patents, I’m always faced with the question—what do I do with a PDF? I will have to steal (borrow) from my smart little brothers work and build a tesseract API prototype.


Wednesday, January 28, 2015

Why Are You So Hard To Get A Hold Of?

This is another post in my ongoing series of regular responses I give to people. Meaning when I get asked something so much, I craft blog posts that live on, and I reply to emails, tweets, etc. with a quick link to my standardized responses.

One I get less frequently, but still enough to warrant a response to, “why are you so hard to get a hold of?"

To which the answer is, "I’m not". I have a phone number that are very public, I have 3 emails all going into same inbox, a highly active Twitter, LinkedIn, Facebook, and Github presence. If you are having trouble getting a hold of me, it is because you are not using the right channels, or potentially the right frequency.

First, I don’t talk on the phone. I schedule meetings, increasingly only on Thursdays (regularly for partners, etc.), where i talk on skype, ghangout, and occasionally the phone. When I talk on these channels, I can do nothing else. I can’t multi-task. I am present. If I did this all the time, I wouldn’t be the API Evangelist—I’d be that phone talker guy.

Second, I respond well to quick, concise emails, tweets, wall posts, and github issues. Shorter, the more concise the better. This is what I mean by frequency, if you send me a long-winded email, there is good chance it could be weeks or even never that will respond. Sorry, I just don’t have the bandwidth for that frequency—I use short, precise signals.

I do not have a problem with someone being a “phone person”, but I’m not, sorry. In my experience people who require lots of phone calls, also require lots of meetings, and often shift in their needs, because it isn’t anchored to any specific outline, document, or project requirements. Personally I try to avoid these types of personalities, because they have proven some of the least efficient, and most demanding relationships in my professional life.

Please don't take this message the wrong way, I'm trying to help you be as successful as you can in making the right connection.


There Is A Good Chance That I Will Be Remembered For What You Did, Because I Told The Story

My friend Matthew Reinbold (@libel_vox) wrote a great piece on his blog titled, Storytelling and The Developer’s Need To Communicate, reflecting on an un-conference session I did last July at API-Craft in Detroit. Thanks for the great thoughts on storytelling Matt, something that is super infectious, and has reminded me a related story, which I hope continues to emphasize the importance of storytelling in API space.

Another one of my friends that I thoroughly enjoy swapping stories with at API conferences, and in the dark corners of bars around the world, is Mike Amundsen (@mamund). Now I may have the name wrong, but one time Mike told me a story about how John von Neumann (correct me if I’m wrong Mike), is known for a lot of ideas that he didn’t necessarily come up with on his own. He was just such a prolific thinker, and storyteller, which allowed him to process other people’s ideas, then publish a paper on the subject before anyone else could. Some people would see this as stealing of ideas, but one can also argue that he was just better at storytelling.

While I have developed many of my own ideas over the years, much of what I write about is extracted from what others are up to across the API space. I have made an entire career out of paying attention to what technologists are doing, and telling a (hopefully) compelling story about what I see happening, and how it fits into the bigger API picture. As a result, people often associate certain stories, topics, or concepts to me, when in reality I am just the messenger—something that will also play out in the larger history, told in coming years.

I’m not that old, but I’m old enough to understand how the layers of history lay down, and have spent a lot of time considering how to craft stories that don’t just get read, but they get retold, and have a way better chance of being included in the larger history. As Matthew Reinbold points out, all developers should consider the importance of storytelling in what they do. You don’t have to be a master storyteller, or super successful blogger, but your ideas will be much better formed if storytelling is part of your regular routine, and the chances you will be remembered for what you did, increases with each story that you tell.


Tuesday, January 27, 2015

Cybersecurity, Bad Behavior, and The US Leading By Example

As I listened to the the State of the Union speech the other day, and stewed on the topic for a few days, I can’t help but see the future of our nations cybersecurity policy through the same lens as I view our historic foreign policy. In my opinion, we’ve spent many years behaving very badly around the world, resulting in very many people who do not like us.

Through our CIA, military, and general foreign policy we’ve generated much of the hatred towards the west that has resulted in terrorism even being a thing. Sure it would still exist even if we didn’t, but we’ve definitely fanned the flames until it has become the full-fledged, never-ending profitable war it has become. This same narrative will play out in the cybersecurity story.

For the foreseeable future, we will be indundated in stories of how badly behaved Russia, China, and other world actors are on the Internet, but it will be through our own bad behavior, that we will fan the flames of cyberwarfare, around the world. Ultimately I will be be reading every story of cybersecurity in the future, while also looking in the collective US mirror.


Thursday, January 22, 2015

I Judge API Providers On How Much Value They Give Back vs. What They Extract

There are a number of data points I evaluate people and companies on while monitoring the API space, but if I had to distill my evaluation of companies down to one things, it would be based upon how much value they give back to the community vs. how much they extract.

You see some companies are really good about providing value to the community beyond just their products and services. This is done in many ways, including the open sourcing of tools, creation of valuable resources like white papers and videos, or just being active in sharing the story behind what they do.

Then there are companies who seem to be masters at extracting value from developers, and the wider API community, without ever really giving back. These companies tend to focus specifically on their products and services, and rarely share they code, knowledge, or other resources with the wider API space.

I’m not going to name specific examples of this in action, but after four years of operating in the space it is becoming easier to spot which camp a company exists in--you know who you are. I understand companies have to make money, but I’m totally judging companies across the API space based upon how much value they give the community vs how much they extract during their course of operation.


Wednesday, January 21, 2015

When Will My Router Have Docker Containers By Default?

This is something I’m working on building manually, but when will the wireless router for my home or business have Docker container support by default? I want to be able to deploy applications, and APIs either publicly or privately right on my own doorway to the Internet.

This would take more work than just adding storage, compute, and Docker support at the router level. To enable this there would have to be changes at the network level, and is something I’m not sure telco and cable providers are willing to support. I’ll be researching this as a concept over the next couple months, so if you know of any read-to-go solutions, let me know.

It seems like enabling a local layer for docker deployment would make sense, and help move us towards a more programmable web, where notifications, messaging, storage, and other common elements of our digital lives can live locally. It seems like it would be a valuable aggregation point as the Internet of Thing heats up.

I could envision webhooks management, and other Internet of Things for household automation living in this local, containerized, layer of our online worlds. This is just a thought. I’ve done no research to flush this idea out, which is why its here on If you know of anything feel free to send information my way.


Machine Readable Format For Migrating Concepts From Dreams Into The Real World

Obviously I’m working with APIs.json and Swagger a little too much, because it has really started to affect my dreams. Last night I had a dream where I was working with a university research team to define a machine readable format for migrating concepts from the dream world into the physical world.

I’m not sure I want this machine readable, but regardless it was a fun dream, and I wasn’t worried about this in the dream, so I guess it is ok. In the dream I was able to go to sleep and dream about a concept, then wake up and apply the same concept in my regular day via my iPhone. It allowed me to pick and choose from a notebook of things I had experienced in my dreams, and then apply in my daily life as I chose.

This post lives in the grey space between my fictional storytelling, and my API Evangelist storytelling, so I’ll leave it here on Kin Lane. If you are planning a startup in this area, let me know. ;-)


Thursday, January 8, 2015

Internet Of Things Security And Privacy Will Always Begin With Asking If We Should Do This At All

As I read and listen to all of the Internet of Things stories coming out of CES, I’m happy to be hearing discussions around privacy and security, come out of the event. I feel better about IoT security and privacy when I hear things like this, but ultimately I am left with overwhelming concern about of the quantity of IoT devices.

There are many layers to securing IoT devices, and protecting the privacy of IoT users, but I can't help but the think that Internet of Things security and privacy will always begin by asking ourselves if we should be doing this at all. Do we need this object connected to the Internet? Are we truly benefiting from having this item enabled with cloud connectivity?

I'm going to try and keep up with tracking on the API layer being rolled out in support of IoT devices, but not sure I will be able to keep up with the number of devices, and the massive amount of hype around products and services. At some point I may have to tap out, and focus on specific aspects of IoT connectivity ,around what I consider the politics of APIs.


Wednesday, January 7, 2015

Information Sharing And Collaboration In Government With The 18F Jekyll Hub Template

I’m a big fan of Jekyll based sites. All of the API Evangelist network runs as over 100+ little Jekyll sites, within Github repositories, via Github Pages. This is more than just website technology for me, this is my workspace. When you come across a half finished listing of contacts, or building blocks for a particular industry, or possibly a story that isn't fully edited—this is all because you are wandering through my API industry workshop. (pardon the dust)

Over the holidays, my girlfriend Audrey Watters (@audreywatters) has completed her migration of Hack Education and her personal blog Audrey Watters, to a Jekyll based mode of operation. Read her own thoughts about the new found freedom Jekyll is giving her over her content, data, workflow and the publishing of her projects—she is pretty excited.

Like APIs, a Jekyll approach to projects is way more than the technology. It is hard to articulate to folks the freedom, collaboration, flexibility, and transparency it has the potential to  introduce. It is something you have to experience, and see in action before you can fully understand, but I also have to ackknowledge that the transparency introduced by this way of working will not be for everyone.

I originally learned what I know about Jekyll from watching leaders in the federal government space, most specifically Development Seed, and round one Presidential Innovation Fellow, and now full-time Githubber Ben Balter (@BenBalter). Continuing this trend, it makes me happy to see 18F, out of the GSA, providing the 18F Hub, “a Jekyll-based documentation platform that aims to help development teams organize and easily share their information, and to enable easy exploration of the connections between team members, projects, and skill sets.” The 18F Hub is similar to the Developer Hub templates that 18F published, but I think holds a lot of potential in helping on-board a non-developer audience to the concepts of Jekyll,and  Github—hopefully making the social coding platform a little less intimating.

I do not think Jekyll and Github is for everyone. I’m not in the business of evangelizing one platform to rule them all, but I do think Jekyll itself, whether you run on Github, Amazon S3, Dropbox, or your own hosting or internal network environment, is a powerful tool for any project. I’m eager to keep an eye on what agencies put the 18F Jekyll templates to use, because it will signal for me that there are other healthy things going on at the agencies that do.


Tuesday, January 6, 2015

Playing Around With Jekyll Job APIs To Manage My Github Pages

I’m playing around with a concept right now that I’m calling "Jekyll jobs". As you may know, all of my websites use Jekyll, and run on Github Pages. Currently I have over 100 separate repos, and managing the content, and data across these repos can get complex.

I use a standardize approach I call “hacker storytelling” for publishing each of my projects, so I have a handful of things I need to update, ranging from the look of the site, to changes across all Jekyll posts, or pages. To help me keep things orderly and efficient I’m considering a lightweight, API driven, jobs framework to help me manage.

I am looking to make many of these “jobs” available to my girlfriend as well, allowing her to target specific resources, with specific jobs. Some of the jobs I’ve outlined are:

  • Link Management - Help me catalog, and manage the health of links that are used across all blog posts. A lot of links change, go bad, or any other numerous illnesses that occur.
  • Image Management - Help me catalog, optimize, and manage images that are used in my blog posts. I’m pretty good about manually doing a lot of this, but I sure could use help.
  • HTML Management - Sometimes HTML code gets ugly, either because I wrote it and didn’t give it the attention it needed, or possibly because it was generated out of another system, either way there is cleanup and maintenance from time to time.
  • Content Management - After I write a post I like to constantly re-evaluate tagging, indexing, and providing additional links to related content.
  • Content Indexing - My search across all of my Jekyll drive sites is not the best, and I’d like a way I can index all, or specific collections, and serve up as simple search API, maybe using ElasticSearch or something.

As more of my world runs as small, modular, Jekyll projects, I’m needing a way to run jobs against them, and designing APIs that do what I need, and use the Github API to work with my Jekyll site content, makes sense. I’m thinking I will just pass a Github user, and repo name, as parameters to each Github job API, and have it run a specific task against my _posts folder in the Jekyll install.

Since I’m designing these Jekyll jobs as APIs, I can run each one as an API request, and keep the job logic separate from each project. I’ll get a couple of these setup, than blog more about the pros and cons of this approach-who knows it may not be what I need to get it done.


The Rebooting Of WordPress With Just Page, Blog, Image, Link, and Comment APIs

I’m in the process of moving from a custom version of my website, and blog manager, a newer version. Back in 2011 I wrote my own custom CMS, as I migrated Audrey and I off WordPress, to deliver more security (obscurity) into our world. As I look to continue the evolution of my CMS, I’m leaving everything behind, and just launching APIs, and working from there to build exactly the user interface I will need to manage my world.

Even though I had moved my blog(s) from WordPress three years ago, there was still some serious WordPress residue on everything. Many earlier blog posts have very WordPress-esque HTML, and the graphical template I used was originally a WordPress theme, so there was HTML, comments, and many other fingerprints of the early WP template in there.

As I work through this process, I think of WordPress, and how they were considering putting in a full API with version 4.1 release. I don’t see any evidence of it on there, so I can only assume they pushed back its release. I don’t blame them, after talking with them about the challenges they face, I can imagine it is taking more work that you can imagine.

I can’t help but think about a WordPress reboot. In my world, I hate legacy code, and technical debt. I very willing to just throw everything away, and start over—except there is one small difference, I’m not a platform with 65 million users.

However let’s imagine you could! Just reboot WordPress by launching six simple APIs:

  • Pages
  • Posts
  • Links
  • Images
  • Comments

Then let the ecosystem build everything else. Create the first public, and admin UI. Then go from there. Use the brand, the momentum, and the community to reboot, and redefine the popular CMS platform. I think in just a couple of years, you’d see WordPress looking something like SalesForce or Heroku.

For me personally, I like the freedom that comes with using APIs. It makes it easy to decouple legacy code, and evolve small, or even large parts of what I do. Another aspect in which I am very fortunate to do what I do for a living. I think back over my career and all the legacy code bases I’ve had to herd around like cattle, and I am much happier in my current world of APIs.


Friday, January 2, 2015

My Unique Visitors and Page Views For API Evangelist Between Google And CloudFlare

I’ve been running all of my websites using CloudFlare since this last Thanksgiving weekend. I pointed all of my name-servers for my primary domains like and to CloudFlare, and I use them manage my DNS, and other related operations of my primary websites.

I’m intrigued by the reporting at the DNS level provided by CloudFlare, compared to the reporting at the page level provided by Google Analytics. I’ve had Google Analytics installed on all of my websites since I first launched, and use it to establish the estimates for the daily and monthly visitors to my websites—beyond that I really don’t care much about these analytics.

Regardless I think it is interesting to look at CloudFlare numbers for the last 30 days:

  • Regular Traffic: 112,241
  • Crawlers/Bots: 55,540
  • Threats: 1,697
  • Unique Visitors: 34,501
  • Page Views: 169,478

Then look at the Google Analytics number for the last 30 days:

  • Sessions: 22,569
  • Users: 17,880
  • Page Views: 38,949

Ultimately you can only compare the CloudFlare unique visitors, and Google Analytics users—these are the only two numbers that are comparable in my opinion. I don’t think CloudFlare removes crawlers/bots from page views, something Google does by default I’m assuming—rendering page views as a very different beast for each service.

I take away two things from this. 1) How meaningless data points are, unless you believe in them. 2) How data points can differ from provider to provider, and at different levels of your architecture. If you ask me what my page views are for API Evangelist, what do I say? You didn’t ask me whether it was my CloudFlare or my Google Analytics page views!

When I think about the millions of hours spent in Google Analytics dashboards across numerous industries, and the companies I’ve seen spending millions in Adwords for their advertising, all based upon numbers derived from this constructed reality, that we’ve collectively bought into—I am blown away.