As an add-on to our presentation we produced two more things, that some of you out there mind find helpful too:
- Mappable Toolset: The number of tools to process data, make maps, interactive visualizations etc. is continuously growing. While we love new tools, this leads to a situation that makes it quite hard to keep an overview of which tools are good for a certain tasks, where to find them and how much they cost. To keep track of the tools we've used so far and as a guide for others we thus collected our toolset. Have a look at it here:English version, German version.
- Mappable Cheat-Sheet: Making maps and other visualizations with a geospatial component is certainly not a trivial tasks. There are many pitfalls, take alone spatial reference systems as an example, that might completely mess up your visualization if you don't handle them correctly. We thus created a checklist for making geodata visualizations in (data-driven) journalism. You can find it here: English version, German version.
Free and Open Company Data on millions of companies and corporations in over 20 countries, including UK, Spain, US, ...
What is OpenCorporates?
OpenCorporates aims to do a straightforward (though big) thing: have a URL for every company in the world.
Is that all?
Well, no useful though that would be, we're also gradually importing government data relating to companies, and trying to match it to specific companies
Why do this?
Few parts of the corporate world are limited to a single country, and so the world needs a way of bringing the information together in a single place, and more than that, a place that's accessible to anyone, not just those who subscribe to proprietary datasets. See also the OpenCorporates Principles
There are quite a few countries you're missing
We've grown from 3 territories and a few million companies to over 75 jurisdictions and 55 million companies, and are working with the open data community to add more each week.
How can we get hold of the data?
We have a new API service, as well as our highly popular Google Refine reconciliation service (seedocumentation), and this allows access to the information as JSON or XML. If you need data in bulk, either for academic research work, for another cool open data project, or commercially, drop us an email email@example.com.
CKAN is open source, free software. This means that you can use it without any license fees, but more importantly, when you choose CKAN for your catalog you are also ensuring that you retain all rights to the data and metadata you enter, giving you freedom to move it elsewhere or manipulate it with your own tools without restriction.
There are lots of different open source licenses (you can find them at http://opensource.org) – CKAN is licensed specifically under the terms of the Affero GNU GPL v3.0. One of the strengths of the open source model is in the communities that form around free software products. The CKAN community is no different, and is arguably one of the strongest open data communities in the world. Together, the CKAN community has a wealth of knowledge and expertise that other people using the CKAN software can draw on. The Open Knowledge Foundation draw on and contribute to this rich resource to help us drive CKAN product development. - See more at: http://ckan.org/developers/about-ckan/#sthash.nY6V2GU7.dpuf
A list of top 5 open source project management tools for 2014.
Opensource.com covered some popular open source project management tools (ProjectLibre, ]project-open[, and OpenProject.) We found these articles to be valuable to our readers, so here we take a look forward at what we think 2014 holds for these open source project management tools.
This is by no means an exhaustive list, but each tool listed here has been deliberately selected based on a rich feature set:
Citizens and (government) agencies create and collect a lot of data, which they are now opening up for reuse more and more. This dashboard makes use of the latest open data from a wide range of municipal services. By using the Linked Data API from the CitySDK project , this City Dashboard becomes easily transferable to other cities using the same interface. The CitySDK Linked Data API also makes information searchable and available on demand, enabling developers to create applications such as this dashboard.
Download the Full Report NHS England and The GovLab at New York University have jointly created a blueprint – The Open Data Era in Health and Social Care – to accelerate the use of open data in healthcare settings.
The Value of Open Data: The blueprint suggests a framework to review the potential for open data in:
• Holding healthcare organizations and providers accountable for treatment outcomes.
• Enabling patients to make informed choices from among the healthcare options available to them.
• Improving the efficiency and cost-effectiveness of delivering healthcare.
• Improving treatment outcomes.
• Educating patients and their families and make healthcare institutions more responsive.
• Fueling new healthcare companies and innovation.
Need for Evidence: For all the recognition of open data’s potential, there is an urgent need for more research and actionable evidence to help guide investments and priorities. This blueprint represents a framework for a conversation on how NHS England can develop an evidence-based program to guide the investments and further research into the benefits of open data.
For data brokers that provide marketing products, Congress should consider legislation to:
- Centralized Portal. Require the creation of a centralized mechanism, such as an Internet portal, where data brokers can identify themselves, describe their information collection and use practices, and provide links to access tools and opt- outs;
- Access. Require data brokers to give consumers access to their data, including any sensitive data, at a reasonable level of detail;
- Opt-Outs. Require opt-out tools, that is, a way for consumers to suppress the use of their data;
- Inferences. Require data brokers to tell consumers that they derive certain inferences from from raw data;
- Data Sources. Require data brokers to disclose the names and/or categories of their data sources, to enable consumers to correct wrong information with an original source;
- Notice and Choice. Require consumer-facing entities – such as retailers – to provide prominent notice to consumers when they share information with data brokers, along with the ability to opt-out of such sharing; and
- Sensitive Data. Further protect sensitive information, including health information, by requiring retailers and other consumer-facing entities to obtain affirmative express consent from consumers before such information is collected and shared with data brokers.
PDF: “Data Brokers: A Call for Transparency and Accountability”
As part of an ongoing effort to build a knowledge base for the field of opening governance, the GovLab Wiki provides a collaborative repository of information and research at the nexus of technology, governance and citizenship. Every two weeks, The GovLab Blog will publish a snapshot of recent additions posted to the wiki. The following is a summary […]
Our latest updates to the wiki focus on platforms for engaging the crowd, finding and engaging expertise and innovating at the local level – from improved hyperlocal city news to reducing strain on hospital emergency rooms.
• Tools for collaboration, like GitHub, have exploded in popularity over the course of the last four years – both in terms of active users and engagement on the platforms by those users.
• Crowdsourcing is increasingly being used a tool for addressing public problems and needs. SeeClickFix, for example, is helping local governments intelligently address problems affecting their citizens, and, perhaps surprisingly, these citizen-identified problems are being addressed in large numbers. Neighbor.ly, on the other hand, provides local governments with a platform for raising funding for public projects with demonstrated importance to citizens.
• Tools for identifying and engaging individuals with specific skills or interests – whether in terms of job experience, like LinkedIn, Futures.inc and oDesk; charitable interests, like Catchafire; or academic research focus, like ResearchGate – are becoming more prevalent, demonstrating the growing interest in such abilities across sectors.
• Tools for providing hyperlocal information, like EveryBlock, and personally relevant data, like Propellor Health can help improve citizens’ use of public services ranging from emergency rooms to community cultural centers.
Bitcoin (bitcoin.org) is a digital, cryptographically secure currency. Transactions between public-key "addresses" maintained in a distributed, verified public ledger form a transaction network that can be studied by network scientists. This code processes binary-format Bitcoin .dat files generated by the Bitcoin client (bitcoin.org, tested on v0.5.3.1 or lower) into human-readable flat-file formats, retaining all available information. Furthermore, we provide a data model to facilitate storage and querying in a relational database.
2. Bitcoin transaction overview:
The bitcoin digital currency allows users to securely prove ownership of a portion of coins that cascade through the network as a chain of re-assigned ownershiptransactions over time.A transaction on the bitcoin network is a many-to-many function, executed by a user who has ownership to (potentially many) outputs of previous transactions; the user takes this owned value and writes ownership to (potentially many) output nodes (users, represented by addresses in the network).
Recently Plugged In to Gnip partner, Qlik, and international relief organization, Medair, hosted a hackathon focused on using social data to inform global disaster response.
At Gnip, we’re always excited to hear about groups and individuals who are using social data in unique ways to improve our world. We were recently fortunate enough to support this use of social data for humanitarian good first-hand. Along with Plugged In to Gnip partner, Qlik, and international relief organization, Medair, we hosted a hackathon focused on global disaster response.
The hackathon took place during Qlik’s annual partner conference in Orlando and studied social content from last year’s Typhoon Haiyan. Historical Twitter data from Gnip was paired with financial information from Medair to give participants the opportunity to create new analytic tools on Qlik’s QlikView.Next BI platform. The Twitter data set specifically included Tweets from users in the Philippines for the two week period around Typhoon Haiyan in November of 2013. The unique combination of data and platform allowed the hackathon developers to dissect and visualize a massive social data set with the goal of uncovering new insights that could be applied in future natural disasters.
Big-data researchers have the option to stop doing their research once they have the right result. In options language: The researcher gets the “upside” and truth gets the “downside.” It makes him antifragile, that is, capable of benefiting from complexity and uncertainty — and at the expense of others.
But beyond that, big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal). It’s a property of sampling: In real life there is no cherry-picking, but on the researcher’s computer, there is. Large deviations are likely to be bogus.
Graph databases use graph structures (a finite set of ordered pairs or certain entities), with edges, properties and nodes for data storage. It provides index-free adjacency, meaning that every element is directly linked to its neighbour element. No index lookups are necessary. Graph database are faster when it comes to associative data set compared to relational databases. As they do not need join operations, they can scale naturally to large data sets.
We are working on a user-friendly, open-source, vote and debate tool, crafted for parliaments, parties and decision-making institutions that will allow citizens to get informed, join the conversation and vote on topics, just how they want their representatives to vote. A tool that will transform the noise we create during protests into a signal that has a clear, direct and strong impact on the political system. Our vision is that DemocracyOS will become the operating system of a more open and participatory government. Live Demo for the City of Buenos Aires.
How complex are international corporate structures?
If you want to understand how complex multinational companies are, consider this:
In Hong Kong, there's a company called Goldman Sachs Structured Products (Asia) Limited. It's controlled by another company called Goldman Sachs (Asia) Finance, registered in Mauritius.
That's controlled by a company in Hong Kong, which is controlled by a company in New York, which is controlled by a company in Delaware, and that company is controlled by another company in Delaware called GS Holdings (Delaware) L.L.C.
Which itself is a subsidiary of the only Goldman you're likely to have heard of, The Goldman Sachs Group in New York City.
That's only one of hundreds of such chains. All told, Goldman Sachs consists of more than 4000 separate corporate entities all over the world, some of which are around ten layers of control below the New York HQ.
Of those companies approximately a third are registered in nations that might be described as tax havens.Indeed, in the world of Goldman Sachs, the Cayman Islands are bigger than South America, and Mauritius is bigger than Africa.
These are maps of the top five banking companies in the US, and are based on publicy available data from the Federal Reserve. Read more about our data on the link at the top left.
I want to understand aid flows into my country / from my country / from a certain donor /globallyI want to find aid data
What is ‘open development’?
Open Development sits at the intersection of the ‘open’ movement, and international development. This could take the form of looking at how open data can affect decisions made within international development; open access to research materials; or opening up the ways we work, for example by being more inclusive, to name just a few examples. If you want to find out more about open development, join the Open Development mailing list, where a wide range of topics within Open Development are discussed. It’s open to everyone to join!
CitySDK is creating a toolkit for the development of digital services within cities. The toolkit comprises of open and interoperable digital service interfaces as well as processes, guidelines and usability standards. CitySDK enables a more efficient utilisation of the expertise and know-how of developer communities to be applied in city service development.
Apps and tools for CitySDK are developed in cooperation with the Code for Europe fellows (see www.codeforeurope.net)
BigML's goal is to create a machine learning service extremely easy to use and seamless to integrate.
Putting the power of Machine Learning in your hands.
Our goal is to make machine learning simple and beautiful.
Our service can take the complexities out of creating a high-availability, low-latency Machine Learning system created especially for your data. You will not only gain valuable insights from your data, you will most likely enjoy it.
From the developer, to the researcher, to the multinational corporation, BigML has something that can uncover the hidden predictive power of your data.
The GDELT Project is a realtime network diagram and database of global human society for open research Watching The Entire World
GDELT monitors the world's news media from nearly every corner of every country in print, broadcast, and web formats, in over 100 languages, every moment of every day.
GDELT monitors print, broadcast, and web news media in over 100 languages from across every country in the world to keep continually updated on breaking developments anywhere on the planet. Its historical archives stretch back to January 1, 1979 and update daily (soon to be every 15 minutes). Through its ability to leverage the world's collective news media, GDELT moves beyond the focus of the Western media towards a far more global perspective on what's happening and how the world is feeling about it.
Querying, Analyzing and Downloading
The entire GDELT database is 100% free and open and you can download the raw datafiles, visualize it using the GDELT Analysis Service, or analyze it at limitless scale with Google BigQuery.
This is just the beginning. There are also Big Data startups that are developing personal data marketplaces. These young companies are taking a different approach regarding Big Data and are empowering consumers to determine what’s done with their data and receive monetary rewards for the usage of their data.
One of such companies is Handshake. They are working hard to cut out the data brokers such as Experian or Acxiom and give consumers the power over their personal data. End-users are giving monetary rewards in exchange for their data and a bit of their time. Users can share the usual personal information as well as more detailed personal information about their hobbies and life. The more data is shared and the more time spent with Handshake, the more money consumers can make. According to Duncan White, CEO of Handshake, an individual can make up to $ 24.000 per year through the platform. Of course this requires substantial time and dedication to the platform, but it is an interesting business model.
Another new startup is Ctrlio. This company is developing a platform for individuals to become more in control of their own data, decide what to do with the data and save money too via personalized offers. The advantage for brands is that they can make very relevant offers based on rich personal profiles, resulting in higher conversion rates.
A third Big Data startup targeting personal data is Datacoup. Currently they are running a beta where they offer users $8 a month in return for their data
The Open Data 500 is the first comprehensive study of U.S. companies that use open government data to generate new business and develop new products and services.
Provide a basis for assessing the economic value of government open data Encourage the development of new open data companies Foster a dialogue between government and business on how government data can be made more useful
The Govlab’s Approach
The Open Data 500 study is conducted by the GovLab at New York University with funding from the John L. and James S. Knight Foundation. The GovLab works to improve people’s lives by changing how we govern, using technology-enabled solutions and a collaborative, networked approach. As part of its mission, the GovLab studies how institutions can publish the data they collect as open data so that businesses, organizations, and citizens can analyze and use this information.
The GovLab is now planning to use this study’s findings to convene a series of roundtables between government agencies and businesses that use their data to help improve the processes and priorities for data release. The Department of Commerce has committed to participate in the first discussion; other federal Departments have expressed an intent to participate in future roundtables.
In addition to our work in the U.S., we are now in discussions with representatives of several national governments and international organizations about the potential to replicate the Open Data 500 study in other countries.