Build engaged audiences through publishing by curation.
Sign up with Facebook
Sign up with Twitter
Sign up with Linkedin
I don't have a Facebook, a Twitter or a LinkedIn account
I provide an overview of the data science workflow and highlight some challenges that data scientists face in their work.
Are you sure you want to delete this scoop?
@luiy. Great article about #DataScience: the workflow design, methods and problematics.
What do data scientists do at work, and what challenges do they face?
This post provides an overview of the modern data science workflow, adapted from Chapter 2 of my Ph.D. dissertation, Software Tools to Facilitate Research Programming.
The figure below shows the steps involved in a typical data science workflow. There are four main phases, shown in the dotted-line boxes: preparation of the data, alternating between running the analysis andreflection to interpret the outputs, and finally dissemination of results in the form of written reports and/or executable code.
OKF Belgium are thrilled to announce a CKAN extension, integrating DataTank functionality into CKAN.
The Governance Lab (The GovLab) — a research institution at New York University—released the beta version of its Open Data 500 project—an initiative designed to identify, describe, and analyze companies that use open government data in order to study how these data can serve business needs more effectively. As part of this effort, the organization is compiling a list of 500+ companies that use open government data to generate new business and develop new products and services.
Not only does student participation decline dramatically throughout the new generation of web-based courses but the involvement of teachers in online discussions makes it worse.
An animated map of global wind conditions.
A visualization of global weather conditions forecast by supercomputers updated every three hours.
Renewable energy forecaster 3TIER has made its wind and solar annual averages available via the International Renewable Energy Agency (IRENA)’s Global Renewable Energy Atlas, an open-access online platform.
Open Data Center Alliance (ODCA), the global organization where members work together to advance the deployment of enterprise cloud solutions and services that are interoperable, secure and free of vendor lock-in, today announced the release of the Data Management for Information as a Service Usage Model. This focused usage model defines the tasks necessary to successfully manage data when using information-as-a-service. It starts with an introduction to Information as a Service (Info-aaS) and then describes data management in four stages. The requirements have been developed to help organizations identify data management opportunities and risks, and as a resource organizations can use to simplify the challenges involved in harnessing and leveraging data from across the entire extended enterprise.
A new report released today by Knight titled "The Emergence of Civic Tech: Investments in a Growing Field" aims to advance the movement by providing a starting place for understanding activity and investment in the sector. The report identifies more than $430 million of private and philanthropic investment directed to 102 civic tech organizations from January 2011 to May 2013. In total, the analysis identifies 209 civic tech organizations that cluster around pockets of activity such as tools that improve government data utility, community organizing platforms and online neighborhood forums. Along with the report, we’ve developed an interactive data visualization tool with the help of Fathom to explore the network of civic tech organizations and their connections to one another.
The goal of the Open Scholar Foundation is to improve the efficiency of scholarly communication by providing incentives for researchers to openly share their digital research artifacts, including manuscripts, data, protocols, source code, and lab notes.
Albuquerque isn't just home to Breaking Bad's Walter White; it's also a hub of open data innovation.
StateTech has already detailed that the Albuquerque Police Department's Real Time Crime Center aggregates dozens of databases, live video feeds and GIS files to deliver nearly real-time information to law enforcement officials. The project, which has been operational for less than a year, is already helping officers perform their jobs and keep safe.
LibHack is a library hackathon that will take place on January 24, 2014 from 9:30am-5:00pm in the Special Collections Center on the 6th floor of the University of Pennsylvania’s Van Pelt Library. The event, sponsored by the LITA Library Code Year Interest Group, OCLC, and the Digital Public Library of America (DPLA), features opportunities for beginning, intermediate, and advanced programmers to create something and improve their coding skills.Click here to edit the title
SmartOpenData will create a Linked Open Data infrastructure (including software tools and data) fed by public and freely available data resources, existing sources for biodiversity and environment protection and research in rural and European protected areas and its National Parks. This will provide opportunities for SMEs to generate new innovative products and services that can lead to new businesses in the environmental, regional decision-making and policy areas among others. The value of the data will be greatly enhanced by making it available through a common query language that gives access to related datasets available in the linked open data cloud. The commonality of data structure and query language will overcome the monolingual nature of typical datasets, making them available in multiple languages.
"Linked Open Data is becoming a source of unprecedented visibility for environmental data that will enable the generation of new businesses as well as a significant advance for research in the environmental area. Nevertheless, in order for this envisioned strategy to become a reality, it is necessary to advance the publication of existing environmental data, most of which is owned by public bodies."
Despite being used for many years, it is now that crowdsourcing is becoming a hot topic in science. Read below what is it and how crowdsourcing science can help you?
Visualization continues to mature and focus more on the data first than on novel designs and size. People improved on existing forms and got better at analysis. Readerships seemed to be more ready and eager to explore more data at a time. Fewer spam graphics landed in my inbox.
The Hyperaudio ecosystem is made up of independent parts that flow together, made to ease the process of creating media from scratch.
The growing availability of open data -- freely accessible, machine-readable information -- produced by governments and institutions is triggering a new wave of economic stimulus. By one estimate, open data has the potential to unlock between $3.2 trillion and $5.4 trillion in additional economic value annually across a variety of industries, according to a McKinsey Global Institute report released in October.
The Open Science Framework (OSF) is part network of research materials, part version control system, and part collaboration software. The purpose of the software is to support the scientist's workflow and help increase the alignment between scientific values and scientific practices.
Journals, funders and scientific societies can use the OSF as back-end infrastructure for preregistration, data and materials archiving, and other administrative functions. Email firstname.lastname@example.org.
The open scientist proactively ensures that published research is freely and conveniently available to all. Ideally, the open scientist releases research under a license like Creative Commons BY that explicitly allows use in derivative works as long as attribution is given.
On May 9, 2013, President Obama signed an Executive Order, Making Open and Machine Readable the New Default for Government Information, directing historic steps to make government-held data more accessible to the public, entrepreneurs, and others as fuel for innovation, economic growth, and government efficiency.
Over a dozen agencies have launched webpages at agency.gov/data, making it easier for the public to find, understand, and use government data. Many agencies have released—and will continue to release—new datasets, which are now available both on agencies’ public data webpages and on Data.gov.
Leading academic journals are distorting the scientific process and represent a "tyranny" that must be broken, according to a Nobel prize winner who has declared a boycott on the publications.
The Open Data movement is about many things – transparency, accountability, even democracy – but at one level, it’s also about value for money. City, state and national governments spend taxpayers’ money to pay for data collection, whether they’re conducting a countrywide census, launching weather satellites, or tracking the movements of city buses. By simple logic, Open Data advocates have argued that taxpayers should have free, open access to the data they’ve paid for, with exceptions made for data that needs to be protected for privacy or security reasons.
"A core problem is that government agencies have taken what I’d call a supply-side approach to Open Data: They’ve chosen what data to release without much input from the people it’s supposed to serve.
Daniel Kaufmann of Revenue Watch has called this the problem of “Zombie Data” – data that exists without purpose or any real use.
Under Communications Minister Malcolm Turnbull’s watch, the way we engage with government agencies is set to go digital by default.
Speaking via prerecorded video at the GovInnovate conference in Canberra last week, Minister Turnbull issued an unequivocal call to action to the Australian Public Service to improve the quantity of government services delivered online, and enrich their quality, depth and level of engagement with citizens.
What does open data / open knowledge have to do with Crisismapping? Everything. In times of crisis, we live in open data / open government ecosystem. We seek, build and make it happen in real time – talk converts to action quickly.
How can councils use data and what are the best examples already out there? Read the comments of our expert panel on the subject
This is the text of a talk I gave at the (wonderful) National Digital Forum in Wellington, New Zealand on November 27th, 2013. You can also find my slides here. Hi there. Thanks for inviting me to NDF 2013, it is a real treat and honor to be here.