2010 Annual Meeting: Workshop on Transient Anomalous Strain Detection

Organizers: Rowena Lohman, Jessica Murray-Moraleda
Date: Saturday, September 11, 2010 (10:00am - 5:30pm)
Location: Palm Canyon Room, Hilton Palm Springs Resort, Palm Springs, CA

The Transient Detection Test Exercise is a project in support of one of the SCEC3 priority science objectives, to “develop a geodetic network processing system that will detect anomalous strain transients.” Fulfilling this objective will fill a major need of the geodetic community. A means for systematically searching geodetic data for transient signals has obvious applications for network operations, hazard monitoring, and event response, and may lead to identification of events that would otherwise go (or have gone) unnoticed.

As part of the test exercise, datasets are distributed to participants who then apply their detection methodology and report back on any transient signals they find in the test data. We are currently in Phase III of the project. Phases I and II used synthetic GPS datasets. Phase III test data are comprised of synthetic and real GPS observations. Test data, results from completed phases, and further information are available at http://groups.google.com/group/SCECtransient.

The objectives of the workshop will be to assess what we have learned to-date, discuss directions on which to focus as the project moves forward, and establish a timeline for these activities. Presentations and discussion in the first half of the workshop will cover advances in methodologies made over the past year by participating groups, improvements to synthetic data, release of the true signals present in Phase III synthetic data, and review of Phase III participants’ results. In the second half of the workshop participants will address future directions, with particular focus on 1) application of methods to real data, 2) steps needed for operational deployment of algorithms, and 3) extension of testing to data types other than GPS.

All interested individuals are encouraged to attend, regardless of whether they have participated in the test exercise up to this stage.

PARTICIPANTS: Duncan Agnew (UCSD), Roger Bilham (Colorado), Adrian Borsa (UNAVCO), Oliver Boyd (USGS), Ivy S. Carpenter (UCLA), Saul Castro (PCC), Francesco Civilini (UCSB), Maud Comboul (USC), Brendan Crowell (UCSD), Robert Curren (Kansas State), Adrian Doran (Dartmouth), Amy Eisses (UNR), Jean Elkhoury (Caltech), Mike Floyd (UCR), Gareth Funning (UCR), Vahe Gabuchian (Caltech), John Galetzka (Caltech), Robin Gee (UCSB), Marisol Gonzalez (Chico State), Margaret Gooding (LSA Assoc, Inc.), Sven Hauksson (UNR), Pamela Henry (Fault Line, LLC), Tom Herring (MIT), Bill Holt (SUNY-Stony Brook), Tran Huynh (SCEC), Kang Hyeun Ji (MIT), Lizhen Jin (UCR), Tom Jordan (USC), Jingqian Kang (TAMU), Haydar Karaoglu (CMU), Sharon Kedar (JPL), Corné Kreemer (UNR), Nadia Lapusta (Caltech), Samuel Lauda (PCC), Brad Lipovsky (UCR), Zaifeng Liu (TAMU), Rowena Lohman (Cornell), Jeff McGuire (WHOI), Jessica Murray-Moraleda (USGS), Sidao Ni (UST China), Marleen Nyst (RMS, Inc.), Shahid Ramzan (CSUN), Paul Segall (Stanford), Mark Simons (Caltech), Anthony Sladen (Caltech), Surendra Somala (Caltech), Wayne Thatcher (USGS), Brendon Walker (SDSU), Chris Walls (UNAVCO), Wei Wang (USC), Matt Wei (UCSD), Matthew Weller (UNR), Yi-Ying Wen (UCR), Maximilian Werner (ETHZ), Kyle Withers (SDSU), Sayoko Yokoi (ERI Tokyo), Yuehua Zeng (USGS), Zhongwen Zhan (Caltech), Jinquan Zhong (SDSU)

Agenda:

10:00-10:15	Update on the test exercise – developments over the past year and topics to discuss at the workshop	Jessica Murray-Moraleda
10:15-12:00	Presentations about different methodology including results (15 minutes each) Kang Hyeun Ji John Langbein Sharon Kedar Brad Lipovsky Matt Weller Bill Holt Junichi Fukuda / Paul Segall
12:00-13:00	Lunch
13:00-14:00	Presentations about different methodology including results (continued) Jessica Murray-Moraleda/Zhen Liu Jeff McGuire Maud Comboul Zhongwen Zhan / Mark Simons
14:00-14:20	Test data Synthetic data and FAKENET code – innovations since last year Real data – overview of how it was processed	Duncan Agnew Tom Herring
14:20-14:40	Phase III results Discussion/questions about results, plots of how models performed	Rowena Lohman
14:40-15:00	Break
15:00-15:30	Discussion What have we learned from Phase III in terms of detection limits, false alarms, real data versus synthetic? Is further testing needed? What should be the goals of further GPS testing and what should be required of participants? Establish a timeline for testing GPS transient detectors in an operational setting (real-time monitoring of real data). Extension to other data types Which data type(s)? Synthetic and/or real data? How should a test exercise be designed that utilizes other data types (perhaps in combination with GPS)? What infrastructure would be needed? What metrics would be used? Establish timeline.

Workshop Report

Introduction

The SCEC Geodetic Transient Detection Exercise is an ongoing project targeting the SCEC Science Priority Objective to “Develop a geodetic network processing system that will detect anomalous strain transients”. Three phases of testing have been completed. During each phase participants were presented with test data consisting of GPS time series contaminated with a variety of noise sources in addition to, in some cases, signals due to transient fault slip. In all three phases synthetic data were used, and in Phase III one set of real data from southern California GPS sites was also included. Participants applied the transient detection methods they are developing to these data in an attempt to detect the fault slip signals in the data. Further details on the mechanics of the exercise can be found in our report on the 2009 workshop (Murray-Moraleda and Lohman, 2010).

At the 2010 workshop, eleven groups presented the transient detection methodology they are developing, Duncan Agnew presented improvements to his Fakenet code which was used to generate the synthetic test data, Tom Herring described the steps used in generating the real data that were also provided for Phase III, and Rowena Lohman then presented a summary of the results from participants in Phase III. The approaches being developed, the test data sets, and the Phase III results are detailed in the workshop slides for individual presentations. This report will focus on the discussion which took place at the workshop and on the next steps for the project. The two main issues that arose in the discussion centered on further assessing what signals algorithms can reliably detect and initial testing of algorithms in a continuously-running near real-time basis.

Further Phases of the Transient Detection Exercise

The Phase III synthetic test data contained substantially more subtle and complex signals than did the Phase I and II datasets. As a result, the detections reported by the various participants presented a more nuanced picture of the current level of detection capability among these algorithms and highlighted the approaches’ strengths and weaknesses. It was generally agreed that developers must now assess their own algorithms’ performance in order to improve functionality. More systematic analysis of the methods’ sensitivity and false alarm rates as a function of station distribution and source characteristics is also required. The Fakenet code is a valuable resource for this purpose, allowing users to generate as many synthetic datasets as they like with a range of characteristics for internal testing.

At the same time, workshop attendees recognized the need for additional “blind testing” in order to obtain more objective tests of algorithm capability, allow comparison of the algorithms’ strengths and complementary features, and continue to foster a community of researchers targeting the goal of transient detection. Continued use of synthetic data has a role for testing specific source characteristics and for identifying the range of signals that algorithms just cannot detect.

However, there was general agreement that the format of the test phases thus far could be improved in the following ways. First, although many algorithms can efficiently ingest one new position estimate per station daily, applying the algorithms retrospectively to ten years of data is time consuming. As a result, few participants were able to analyze all datasets as carefully as needed, and the results may be an inaccurate representation of the algorithms’ capabilities. Second, presenting the summary of participants’ results at the workshop allows no time for developers to assess their performance and, in the case of missed detections, analyze why the algorithm failed. Finally, in some ways the “blind test” exercise has been too blind in that participants had little input in the types of transient signals represented in the data. It is likely that participants will feel more invested in the project and be more motivated to take part if they feel they have a greater voice in debating and choosing the sorts of signals that should be included in the test data.

Based on this feedback, we recommend the following format for the next phase of the test exercise. We hope that this format will create greater buy-in among participants and lead to sustained improvement of the methods under development.

Developers use the Fakenet code (or other tools if they so choose) to generate their own data for internal testing to improve specific functionality as appropriate for their algorithms.
A single dataset (as opposed to 4 - 12 as in previous phases) consisting of time series for southern California GPS stations that may contain one or more transient slip signals will be released quarterly. Participants will apply their algorithms to this dataset and upload results to the website as normal. Then, a summary of results will be posted to the website for review and online discussion.
Prior to each quarterly release of test data, participants will propose features for the upcoming dataset that will be designed to test specific capabilities (e.g., the ability to detect multiple transients that overlap in time, a transient at the end of the time series, or a transient that occurs at the edge of the network). The proposed tests can and should include tests with real data.

Continuously running near-real-time detection algorithms

While workshop participants generally agreed that further work on improving algorithms could be fruitfully pursued for some time, there was extensive discussion of how to move the project closer to realizing the SCEC goal of operational deployment of one or more detection algorithms. In particular, the SCEC leadership expressed intense interest in having one or more detection algorithms running continuously in an automated fashion at the start of SCEC4 in February 2012.

There was discussion of who would receive (and presumably act on) reports of detected transients were an operational system to be running. Several workshop participants expressed concern that at the current stage of development, most algorithms could not produce results on an ongoing basis without a level of developer intervention that would make doing so prohibitively time-consuming. Moreover, there was universal agreement that the results of any continuously-operating detection algorithms should not be made publically available during the testing phases, especially since most algorithms still lack a rigorous detection criterion. It was also suggested that if several algorithms identified the same transient signal in real data with community consensus regarding the source, that this would be a more exciting result than pressing for lower latency at this stage.

However, many participants acknowledged that if we do not push for continuous, near real-time deployment of algorithms, it will not happen on its own in a timely manner. With one or more algorithms running in an ongoing way, other detection exercise participants who are at an earlier stage of development could compare their results to those from the automatically running algorithms. Moreover, running a large number of tests, for example to statistically assess the sensitivity and specificity of the algorithms, can be done more efficiently if algorithms can be run with little user-intervention.

While operational detection methods are the long-term goal, the target for February 2012 would necessarily be modest and certainly would not be described as “operational”. There was consensus that implementing one or more algorithms to automatically report detections on a weekly basis (e.g., based on daily data for the previous week) was a good first target, and this would mesh well with the weekly availability of the final orbits used in the GPS processing. Algorithm developers would be notified of these detections for review, similar to the way in which automatic earthquake location and magnitude reports are reviewed by a seismologist. Initially only participants in the project, SCEC leadership as appropriate, and perhaps GPS network operators would have access to the results. Once detection algorithms are operational, USGS would likely have the responsibility, given their mandate, to respond to potential transient fault slip events. The output of detection algorithms will also be useful to network operators for monitoring data integrity.

Implementation of one or more continuously-operating algorithms will require IT infrastructure to provide a steady stream of input (i.e., daily GPS positions) to the algorithms, ingest the output of algorithms to display detection information in map and time-line view, and alert developers when their algorithm returns a detection. There are several sources of continually-updated daily processed GPS data that might be used as input. Two applicable examples are those from the Plate Boundary Observatory (PBO) processing centers and the NASA MEaSURES project, both of which cover southern California. Presumably, some work would be necessary to put these data into a standard format that could be accepted by the detection algorithms. It may be possible to draw upon parts of the CSEP infrastructure for running the algorithms and reporting results, or at least to use its development as a model. In CSEP, participants provide an executable of their software as well as source code. Their software is run within the CSEP testing architecture using standard input and output without participant intervention thus reducing the burden for developers of maintaining continuous operation of their code and result reporting. SCEC Leadership expressed willingness to provide support for the development of the necessary IT infrastructure.

We recommend the following steps to achieve an initial phase of continuous monitoring by February 2012 and have added items to the 2011 RFP reflecting this:

Those participants who feel their algorithms are sufficiently well-developed propose for 2011 to modify the algorithms to accept GPS positions on a daily basis and report detections (preferably with a measure of the confidence level at which the detection is made) on a weekly basis.
Tectonic Geodesy leaders work with SCEC IT experts to establish the tasks required for development of the necessary IT infrastructure, and SCEC solicits proposals. Participants and Tectonic Geodesy leaders work together to identify a source for daily GPS positions and establish standard I/O formats.

Publication of the results thus far

Following the workshop it was suggested that the results of the transient detection blind test exercise thus far be published in order to summarize and make available to the broader community what we have learned. Such a report would have two main components: a description of each approach and its pros and cons, and a discussion of what combination of features from similar methodologies might provide the best results. We envision the report to be brief, focusing on outcomes and next steps. Thus, description of methodologies should refer to published work where possible with necessary details for yet-to-be published algorithms given in an online supplement which may also contain the synthetic testing data. Because the detection methods that have been employed in the test exercise generally fall into several broad categories (for example, methods based on PCA, Kalman filtering, or strain rate mapping), the report should devote ample attention to the ways in which complementary approaches may be used in tandem and what the next steps are for implementing ongoing transient monitoring.

Action items include the following

Contact all participants in the blind test exercise to solicit their participation.
Establish the scope and format of the document and identify an appropriate journal to which to submit the report.
Devise a schedule for compilation of participant contributions to the report, editing, and review.

Incorporation of other data types

Discussion of extending the test exercise to other data types was limited as attention during the workshop focused primarily on the issue of continuously-operating algorithms. However, it should be noted that one of the future SCEC activities described in the SCEC4 proposal is the generation of a Community Geodetic Model which would incorporate both GPS and InSAR time series. Thus, adapting or developing transient detection algorithms to use InSAR, either alone or in combination with GPS data, is an important target for future work. Indeed, the U.C. Riverside group presented results at the workshop that demonstrated the application of a PCA-based approach to removing seasonal signals from InSAR data. Such tools are potentially valuable for enabling the identification of other time-varying signals in the data as well.

The next Transient Detection workshop

Annual workshops like those that we have held at the 2009 and 2010 SCEC meetings are an important means of maintaining project momentum. These workshops attracted a broad audience and provided a forum for test exercise participants to describe their methodology. As we move forward, however, some modifications are required to better reflect the evolving needs of the project and address drawbacks of the current format. The envisioned 2011 workshop will be more targeted, thus helping to advance the state of the art more efficiently.

We propose a two-part workshop aimed at those groups who are actively participating in the test exercise and/or extending their methodology to continuous operation. The first half of the workshop will focus on advances to methodology based on results from the periodic blind tests that will be held throughout the year. By making the true signals available directly after each blind test, participants will have had time to investigate any failures of their algorithms in time for the workshop, and discussion can focus on the strengths of individual approaches and ways in which multiple methods might be used together. The second half of the workshop will be dedicated to planning for the implementation of continuously-operational algorithms, building on discussions that will be occurring throughout 2011 with the groups who plan to participate in this aspect. One difficulty that some participants have faced in previous years is that SCEC funding has arrived sufficiently late in the year for groups to complete all the work they proposed in time for workshops held at the SCEC annual meeting. Therefore, we propose to hold this one-day workshop directly after the AGU Fall Meeting in San Francisco.

Conclusion

The SCEC Transient Detection Exercise has succeeded in raising awareness of possible approaches to the detection problem, bringing together a group of researchers actively working on this problem, and spurring the development of a diverse set of algorithms. It is now time to move to the next stage in which we identify the most promising combination of approaches and strive to implement these tools a continuously-operating manner. The September 2010 workshop provided invaluable feedback for shaping this transition via the following steps:

Conduct periodic blind tests designed with greater participant input and involving a more manageable amount of test data.
Provide a framework for the extension of one or more algorithms to continuous operation.
Publish the results to-date as a record of what has been learned and a foundation for further development.
Encourage work that will enable extension of transient detection to other data types.
Hold a targeted workshop which will involve active participants in the test exercise and focus on the application of methodological advances and concrete next steps for continuous operation.

References

Murray-Moraleda, J. and R. Lohman (2010), Workshop targets development of geodetic transient detection methods, Eos Trans. AGU, 91(6), 58.

Return to 2010 SCEC Annual Meeting