 Your new post is loading...
|
Scooped by
Shiwon Cho
|
What data structure is more sacred than the link list? If we get rid of it what silly interview...
|
Scooped by
Shiwon Cho
|
|
Scooped by
Shiwon Cho
|
Intel® SDK for OpenCL* Applications 2013 is a comprehensive software development environment for OpenCL applications on the 3rd and the future 4th Generation Intel® Core™ processors, which support OpenCL 1.2 on Windows 7* and Windows 8* operating systems
|
Scooped by
Shiwon Cho
|
PARALUTION is a library for sparse iterative methods with special focus on multi-core and accelerator technology such as GPUs. In particular, it incorporates fine-grained parallel preconditioners designed to expolit modern multi-/many-core devices. Based on C++, it provides a generic and flexible design and interface which allow seamless integration with other scientific software packages. The library is open source and released under GPL.
|
Scooped by
Shiwon Cho
|
This site contains Java bindings for CUDA, CUBLAS, CUFFT, CUDPP, CURAND and CUSPARSE
|
Scooped by
Shiwon Cho
|
This document is a design and coding guide for developing high performance OpenCL applications for the Intel® Xeon Phi™ coprocessor. It will take you from the Intel Xeon Phi coprocessor architecture and microarchitecture, through the key OpenCL constructs and show you how to use them efficiently to best utilize the Intel Xeon Phi coprocessor HW. Since exploiting HW parallelism is essential for performance applications, we will show you how to improve the parallelism of your OpenCL application on Intel Xeon Phi coprocessor. With this knowledge, you will be ready to design and program your application to perform best on Intel Xeon Phi coprocessor through OpenCL.
|
Scooped by
Shiwon Cho
|
Jacket enables GPU computing for MATLAB® codes. The new version v2.3 includes performance improvements and new support for CUDA 5.0. This newer version of CUDA enables computation on the latest Kepler K20 GPUs of the NVIDIA Tesla product line.
|
Scooped by
Shiwon Cho
|
|
Scooped by
Shiwon Cho
|
In this blog post, I will share a C++ AMP implementation of a fractal generator, rendering 4 dimensional Quaternion Julia fractals. I’ll show you screenshots of the app, then we’ll dive into the code, and then I’ll share a pointer of where you can get the Visual Studio project.
|
Scooped by
Shiwon Cho
|
The Computing Language Utility (CLU) is a lightweight API designed to help programmers explore, learn, and rapidly prototype programs with OpenCL. This API reduces the complexity associated with initializing OpenCL devices, contexts, kernels and parameters, etc. while preserving the ability to drop down to the lower level OpenCL API at will when programmers wants to get their hands dirty. The CLU release includes an open source implementation along with documentation and samples that demonstrate how to use CLU in real applications
|
Scooped by
Shiwon Cho
|
Hi, I am Daniel Moth This screencast assumes knowledge of the C++ AMP API, e.g. that you totally understand the matrix multiplication implementation in C++ AMP.
|
Scooped by
Shiwon Cho
|
SnuCL is an OpenCL framework and freely available, open-source software developed at Seoul National University. It naturally extends the original OpenCL semantics to the heterogeneous cluster environment.
|
Scooped by
Shiwon Cho
|
In this work, we describe a simple and powerful method to implement real-time multi-agent path-finding on Graphics Processor Units (GPUs). The technique aims to find potential paths for many thousands of agents, using the A* algorithm and an input grid map partitioned into blocks. We propose an implementation for the GPU that uses a search space decomposition approach to break down the forward search A* algorithm into parallel independently forward sub-searches. We show that this approach fits well with the programming model of GPUs, enabling planning for many thousands of agents in parallel in real-time applications such as computer games and robotics. The paper describes this implementation using the Compute Unified Device Architecture programming environment, and demonstrates its advantages in GPU performance compared to GPU implementation of Real-Time Adaptive A*.
|
|
Scooped by
Shiwon Cho
|
Image processing is a computational task that lends itself very well to GPU compute scenarios. In many cases the most commonly used algorithms are inherently massively parallel, with each pixel in the image being processed independently from the others. As a result, image processing toolkits have been early adopters of the new GPGPU programming model.
|
Scooped by
Shiwon Cho
|
Along with the rise of General Purpose computing on Graphics Processing Units (GPGPU), GPUs themselves are evolving rapidly from fixed-function rasterization engines to more general processors. Today, discrete GPUs are typically connected to the CPU via the PCI Express* (PCIe) bus, which significantly limits the data transfer rate between the devices. Explicit boundaries for different memory spaces/hierarchies and high latency synchronization between devices result in quite a coarse-grained level of abstraction. Most OpenCL workloads today target the GPU only, leaving the CPU to do mainly scheduling, file and network I/O, and other “host” types of orchestration. In this approach the costs of PCIe transfers might be prohibitive if tasks are small and not amortized well by execution speed of a GPU.
|
Scooped by
Shiwon Cho
|
As NVIDIA’s GPU Technology Conference 2013 kicks off this week, there will be a number of announcements coming down the pipeline from NVIDIA and their partners. The biggest and more important of these announcements will be Tuesday morning with NVIDIA CEO’s Jen-Hsun Huang’s keynote speech, while some other product announcements such as this one are being released today with the start of the show.
|
Scooped by
Shiwon Cho
|
|
Scooped by
Shiwon Cho
|
For many programmers sorting data in parallel means implementing a state of the art algorithm in their preferred programming language. However, most programming languages have a good serial sorting function in their standard library. It appears to me, that the obvious thing to do is to first try to use what your language library provides. If this approach is not successful, you should try to find an existing library that is used, and consequently well debugged, by other programmers. Only as a last resort, you should implement a new sorting algorithm from scratch.
|
Scooped by
Shiwon Cho
|
Rob Farber takes you on a tour of the paths to massively parallel x86, MultiGPU, and CPU+GPU applications.
|
Scooped by
Shiwon Cho
|
GacUI is a GPU accelerated user interface library for C++ programming language. It provides similar features as WPF, but some features is limited by C++, such as dependency properties. Here are main features:
|
Scooped by
Shiwon Cho
|
AMD CodeXL is a new unified developer tool suite that enables developers to harness the benefits of CPUs, GPUs and APUs. It includes powerful GPU debugging, comprehensive GPU and CPU profiling, and static OpenCL™ kernel analysis capabilities, enhancing accessibility for software developers to enter the era of heterogeneous computing. AMD CodeXL is available for free, both as a Visual Studio® extension and a standalone user interface application for Windows® and Linux®.
|
Scooped by
Shiwon Cho
|
The Computing Language Utility is an Open Source, lightweight API designed to help programmers explore, learn, and rapidly prototype programs with OpenCL*. This API reduces the complexity associated with initializing OpenCL* devices, contexts, kernels and parameters, etc. while preserving the ability to drop down to the lower level OpenCL* API when programmers wants to get their hands dirty. The CLU release includes an open source implementation along with documentation and samples that demonstrate how to use CLU in real applications.
|
Scooped by
Shiwon Cho
|
The MOSIX group announces the release of the Virtual OpenCL (VCL) cluster platform version 1.14. This version includes the SuperCL extension that allows micro OpenCL programs to run efficiently on devices of remote nodes.
|
Scooped by
Shiwon Cho
|
The 3rd generation Intel® Core™ processor family now supports OpenCL* 1.1 for both the CPU and Intel® HD Graphics.
|
Scooped by
Shiwon Cho
|
Even prior to its name unveiling last week - or official announcement later in the year - Intel's Xeon Phi shook the supercomputer world with supposedly snatching some of the upcoming large projects mostly away from Nvidia Tesla.
|