2010 Annual Report¶
- Author
Howard Butler
- Contact
- Date
12/21/2010
The following document is a synopsis of major development activities that have taken place in the libLAS project (or related projects) in the 2010 calendar year.
Vertical Datum Reprojection and Transformation¶
Frank Warmerdam implemented vertical datum reprojection and transformation in the entire open source GIS stack in the past year (proj.4, GDAL and libgeotiff). This work makes it possible to make vertical datum transformations via command-line utilities like las2las in addition to providing the tools for software developers to implement the features in their own software.
libLAS Processing Kernel¶
libLAS gained something I call the “libLAS Processing Kernel” in the past year. It’s really just a set of common functions that all application-level software can reuse to implement filtering and transformation on LAS data. The collation of all of this functionality in a common place has meant the reuse of the same operations in many of the libLAS utilities include las2las, lasinfo, and las2txt. New features implemented in the kernel include:
Color fetching from GDAL raster data sources
Data reprojection and vertical datum transformation
Numerous filtering operations
Header modification
These utilities are available to all software developers who wish to reuse them in their own tools.
Re-engineering of las2las las2txt and lasinfo¶
las2las las2txt and lasinfo were re-imagined in light of development of the libLAS Processing Kernel to take advantage of new functionality and regularize command-line argument handling and parsing. The previous versions of the utilities have been preserved under the las2las-old, las2txt and lasinfo-old monikers in case people have significant processing workflows developed with them. It would be advantageous to upgrade to the new versions in many cases – both for significantly improved functionality and for a speed improvement that is sometimes double that of the -old versions.
Some new features the utilities gained as part of this effort include:
Setting color information from GDAL rasters
Splitting files based on a point count or a file size in mb
Chaining many filter operations together into a single call
Modifying header information, including setting coordinate system info
Summarizing data more fully and more flexibly (XML, per-point)
Chipper¶
Andrew Bell developed a specialized point partitioning process called lasblock to bucketize point data.The process that aims to optimize the fill capacity, shape, and speed of processing. More specifically, it attempts to keep the blocks as full as possible and as square as possible to augment querying characteristics for Oracle Point Cloud. This pre-processing is needed as precursor step in the processing chain that ends with actually loading the data into Oracle via las2oci. lasblock can also be used as a LAS tiling process, although it is not so memory efficient.
Indexing¶
Gary Huber developed an octree-based spatial index for libLAS to speed up random, bounding-box-based queries to LAS files. It is released as part of libLAS 1.6, but its full implementation within the library is not yet complete. The index can store its data within VLRs (requires a file rewrite) in addition to in a file alongside the .las file.
CMake¶
libLAS was migrated to using CMake for its configuration system. CMake allows easy generation of MSVC project files, XCode project files, and make files under a common configuration. This effort eliminated three parallel build system configurations (MSVC projects, GNU autoconf, MSVC makefiles) and provided more flexibility for packaging, testing, and build types. In my opinion, its use has been a boon to the project.
OSGeo4W¶
For the first time ever, we have released fully-capable Windows libLAS packages in the form of an OSGeo4W release. These releases contain the full range of libLAS functionality including coordinate system support, Oracle support, vertical datum transformation, and chipping. Head to http://trac.osgeo.org/osgeo4w to obtain your copy and start testing libLAS immediately.
New Website¶
We rewrote the libLAS website and transformed it from a bunch of wiki pages in Trac to a Sphinx-backed HTML website. We have added tons more documentation, provided it in formats such as PDF, and organized things significantly.
Generic LAS Schema Support¶
Though it is specifically allowed by the standard but not widely implemented, it is possible to store extra data attached to each point after the requisite PointFormat data are stored. There is neither a regularized way to describe these data nor a way to capture metadata about this. To this end, I have proposed an XML schema document that could be stored in a VLR as well as schema-aware reader and writer implementations that can utilize that VLR to work with the data. See <https://lidarbb.cr.usgs.gov/index.php?showtopic=9075> for more details on the initial proposal of schema support.
libLAS now implements a class called liblas::Schema that is driven by the Point Data Format ID of the header in addition to any extra dimensions you wish to store with the point. This work is used for both the Oracle Point Cloud effort and upcoming LASzip compression integration.
Refactoring of liblas::Point class¶
We significantly refactored the liblas::Point class the 1.6.0 release. The first thing that was done was to make it “thinner” in the sense that it doesn’t store a union of all point-format-derived dimensions on it, and instead stores a reference to a schema that informs the class about which dimensions exist. Additionally, data are interpreted on-the-fly from the raw bytes which compose the point, eliminating the fidelity issues.
libLAS 1.2.1 and below utilized a liblas::Point that was kind of fat. It carried around interpreted data members for all of the dimensions on the point – x, y, z, intensity, etc – and if you asked for one of these, it just returned it to you directly. The interpretation of those data happened as the data were read, and again as the data were written (back into raw bytes).
libLAS 1.6+ has changed liblas::Point in a number of important ways. liblas::Point now only carries along the raw bytes for the point, and if you ask for one of the dimensions, it interprets it on-the-fly. For example, a GetX() call now requires going into the liblas::Point byte array, pulling the first four bytes off of it, asking the point’s header for scaling information, and rescaling the integer data into double data. If you only call GetX() one time, things are roughly equivalent to what we were doing before – interpreting and caching interpreted data directly on the liblas::Point – but every one of your subsequent calls to GetX() have this interpretation performance hit. You need to cache your calls to interpreted data if you are reusing things. Alternatively, you can control when your data have scaling applied by using GetRawX(), which was not possible before libLAS 1.6.
The rationale for moving to this approach was three-fold. First, the LAS committee continually adds new dimensions onto the specification, and I wanted an extendable way to add them to libLAS without causing a full re-engineering of the liblas::Point class every time they do. Second, liblas::Point now has a schema attached to it (based on the list of dimensions that a LAS file’s point format defines plus any custom dimensions you wish to add to the point record). The schema allows you to extend the point format and add your own dimensions and it provides generic descriptive information about what exists in the file. You can see the description of these schemas in the new lasinfo output from libLAS. Lastly, previous versions of libLAS did not allow you to work with raw data, and did not allow the user to transform the data (coordinate data, especially) with perfect fidelity. The new approach explicitly supports this out-of-the-box. Here’s something that is now possible with the new (C++) API that was not previously:
liblas::Point const& p = reader.GetPoint();
std::vector<uint8_t> data = p.GetData();
... // do something with the raw data like stuff it into a database.
Refactoring of internal Reader and Writer code¶
The previous (< libLAS 1.2) C++ reader and writer code of libLAS was a bit inflexible, and contained significant duplication for each file format version. Giant updates would be required to the code as the ASPRS LAS standard committee added new specification versions with new required point formats. Additionally, the old code’s design was a bit rigid for adding things like generic schema support.
Both the liblas::Reader and liblas::Writer have been significantly refactored
The reduction in duplication means going to only one place to make changes to the code. In addition to not repeating ourselves, it provides us more flexibility to add new features and extensibility to allow the reader and writers to be overridden by user code.
Generic interfaces¶
A number of generic interfaces have been added to libLAS to support dynamic polymorphism. See <liblas/liblas.hpp> for the C++ interfaces. By implementing these interfaces, you can add your own reader/writer implementations as well as provide custom filtering and transformation capability.
Faster binary i/o¶
Mateusz Loskot developed a more savvy implementation for its binary i/o which provides some significant performance improvements.
Caching reader¶
A reader implementation that provides data caching will be provided at libLAS 1.6. If your data reading involves reading the data in multiple passes through the file, you can utilize the cached reader to cache the points (up to the size of the entire file) for faster repeated and random access.
Seek support¶
It is now possible to seek to a specific point in the file and start reading points. This can significantly speed up the “random sampling” access strategy where one starts reading a run of points at a specific location in the file and then moves to a different location.
Classification class¶
A class is now provided to abstract the LAS classification value and help interpret the bit fields that are present for synthetic, key point, and withheld types.