Follow this link to skip to the main content
spacer spacer spacer
spacer spacer spacer
spacer
NASA Logo - Jet Propulsion Laboratory    + View the NASA Portal Search JPL
spacer
JPL Home Earth Solar System Stars & Galaxies Technology
GENESIS Global Environmental and Earth Science Information Systems
spacer
spacer spacer spacer
spacer
spacer

SciFlo

"SciFlo" stands for Scientific Dataflow. SciFlo is a system for Scientific Knowledge Creation on the Grid using a Semantically-Enabled Dataflow Execution Environment. SciFlo leverages Simple Object Access Protocol (SOAP) Web Services and the Grid Computing standards (WS-* standards and the Globus Alliance toolkits), and enables scientists to do multi-instrument Earth Science by assembling reusable SOAP Services, native executables, local command-line scripts, and python codes into a distributed computing flow (a graph of operators).


The SciFlo client & server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or directly authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. A Visual Programming tool is also being developed, but it is not required. Once an analysis has been specified for a granule or day of data, it can be easily repeated with different control parameters and over months or years of data.

Goals

The goal of SciFlo is to enable large-scale, multi-instrument Earth science. The SciFlo Network is a Peer-to-Peer (P2P) Network of Grid workflow nodes. However, SciFlo actually exploits, not just workflow, but multiple technology trends. Each SciFlo node serves many purposes and bundles together multiple open-source technologies: SOAP-based Web Services, the SciFlo dataflow engine, a file redirection/caching server, metadata stored in a relational database (mysql), an XQuery-able XML document store (Sleepycat dbxml), and a collaboration environment (shared wiki pages). The challenge is to integrate all of these technologies into a Grid workflow and collaboration environment with many nice features: lightweight, user installable, scalable, runs on a range of hardware from Windows laptops to Linux clusters, supports distributed queries, declarative dataflow, visual programming, load-balanced parallel execution, publishable algorithms and analysis flows, and generated products with preserved lineage and semantic annotations added. In short, the goal is for each scientist to have a personal scientific notebook and a personal data center that is tied automatically into a P2P network which enables Grid computation and group collaboration, all with great ease of use.


All of the power of SciFlo is available through a web browser interface. To execute a SciFlo document, possibly shared by a friend, you simply provide the desired inputs by filling out an HTML form in your browser of choice. To author a dataflow, you start from a template and edit the XML document in outline form using a smart XML editor, or you use the visual programming tool. The distributed dataflow execution network then does the rest:

  • It choreographs parallel execution, potentially using many nodes.
  • Data & operator movement is done automatically by the engine.

  • Each node serves data & operators, executes SciFlo documents, and is a client of other nodes.


SciFlo pervasively uses many XML-based technologies:

  • Metadata described in XML and XML schema
  • Distributed computing via XML messaging (SOAP)
  • Service & operator interfaces described in XML (WSDL)
  • Services published in queryable catalogs (UDDI)
  • Operators and data typed using XML schema and namespaces
  • Semantic kind annotations added to all products using Earth science ontologies (RDF/OWL)


To learn more about SciFlo, you can:

Version 1.0.5 of the SciFlo software bundle is available here. If you are interested in being a beta tester, contact Brian Wilson at Brian.Wilson@jpl.nasa.gov.


Privacy/Copyright Statement


spacer
spacer spacer spacer
spacer
FIRST GOV   + Privacy / Copyright NASA Home Page
spacer
spacer spacer spacer
spacer spacer spacer
JPL NASA Caltech