Skip to content

Final monitoring design

mdutoo edited this page Apr 30, 2012 · 82 revisions

Monitoring architecture

Proxy architecture

EasySOA Monitoring will happen in an extensible Proxy architecture :

zzzzz

  • The Proxy will handle all the calls and give them to the Handler Manager.

  • The Handler Manager will in turn call all Handler components using interface methods TODO in : preForward(in), postForward(in), out : preResponse((in,)out) and postResponse((in,) out).

  • Extension : It will be possible for the user to add his own handlers by adding it from java code or adding the component in the appropriate composite file. Predefined handlers are RecordHandler, DiscoveryHandler, EventHandler and MonitoringHandler.

  • Deployment : By default (EasySOA Light) the proxy is integrated and deployed as an HTTP proxy or tunnel on FraSCAti's Jetty HTTP server. This makes it easy for testing but less interesting in production use, where the proxy should be integrated and deployed within its actual service engine. TODO next we'll also try in FraSCAti's CXF (useful in SmartTravel use case, also in FraSCAti Studio, interesting in CXF on its own).

  • About CXF, JAXWS handlers : EasySOA Proxy Handlers differ from CXF interceptors & JAXWS handlers in that they are more focused on proxying. But they could use them, or other engine-specific features.

RecordHandler

The work of this handler is exclusively to record the calls for replaying. TODO see how we go from there to templatized replay, assertions and simulation.

DiscoveryHandler

TODO rename accordingly in code

This one does service discovery by monitoring : it counts service call events and registers services in EasySOA Core along with their callers (service reference). NB. We may add other information if it appears necessary.

The default implementation does it dumbly all in memory. An alternate one using Esper's capabilities will be provided in another, GPL'd project.

EventHandler

This one will allow to call other services that have registered EventListeners.

MonitoringHandler

This handler's role is to generate web services calls's informations(SOALatency, responseTime...) for EasiFab's SOA monitoring feedback loop. It embeds a Jasmine Collector that puts monitoring events in the Jasmine event database.

Monitoring event model

Monitored fields may come from :

  • monitored exchange (ex. HTTP : headers, content, or further extracted ex. from content XML / JSON)
  • proxy execution (ex. Java context information (security : Principal, transaction), OS process id, host name & ip)
  • proxy configuration (ex. filters on monitored exchanges, setup info ex. environment) : probe identity (what it probes)
  • Jasmine probe configuration or execution (could also be in the previous entry, since here the probe is the proxy)

EasiFab / Jasmine needs

For their own treatments they need some information which are divided in two categories.

  • Some are available directly in monitored exchanges,
  • and others have to be computed afterwards by aggregating several exchange monitoring events, which have to be still available either :
    • in an external dedicated database like Jasmine's
    • in an embedded database like Talend. Limitations: can't aggregate events across several MonitoringHandlers / proxies
    • in MonitoringHandler memory. Limitations: as above, and can only do simple aggregations.

Data that have to be computed by aggregating events:

  • SOALatency : It's an integer, which is computed by MonitoringHandler. It's the result of this calcul: t2-t1+t4-t3. Has to be computed in an external database, since those values come from several MonitoringHandlers. However (t4-t1) and (t3-t2) could be computed in memory by each MonitoringHandler, if request and response are correlated. Only makes sense if both proxies / probes are as close as possible to client & server.
  • ResponseTime : It's an integer, which is computed by MonitoringHandler. It's the difference between t4 and t1. ** Only accurate** if proxy / probe is as close as possible to what it measures for (server or client).

zzzzz

Data that have to be stored and made available in memory:

TODO from EasiFab requirements (xls & ppt) and Talend model (http://jira.talendforge.org/browse/TESB-1682)

  • MI_OPERATION_NAME (Talend) type: varchar Description: Service operation name of event creator
  • SourceTimeStamp (Jasmine) type: Date(YY/MM/DD hhmmss) Description: These are the data in the input file of JASMINe's Collector.

Questions about timestamps and aggregation

  • source timestamp is equals to event collection / monitoring timestamp if individual events are put synchronously in Jasmine database. => are they ? or rather several at a time (event list, more efficient) ?
  • The monitoring TimeStamp is relative to the computer it comes from. However both time deltas t4-t1 and t3-t2 are absolute, therefore ResponseTime and SOALatency also are. LATER absolute timestamps could be useful to map service call ordering within processes, and for this they could be made absolute using a single global timestamp server.
  • Another problem is about the t2 and t3 values, because t1 and t4 are calculed in one java virtual machine but t2 and t3 in another one. So one of the two part(client or provider) would give their datetime's information to the other one for the SOALatency calcul. => should we emit 4 events (simpler) and let them be aggregated in Jasmine, or should we already aggregate deltas from request and response on both sides (easy IF we let the service engine correlate them) ?
  • finally, who computes their mean value (only Jasmine can ?!), so we can collect "one QoSEvent every second" ?
  • other, higher-level monitoring events are computed in Jasmine by rules : ServiceEvent, ServiceState

Questions about identification & correlation

  • unique identification of service endpoint in chosen deployment configuration : endpoint url is not enough, IP is required (and not loopback's 127.0.0.1) ; hostname is nice to have ; deployment configuration can be identified by System and Environment (see model)
  • request to response correlation : done by the Proxy service engine. Asynchronous message responses should be correlated through requests by using the engine's appropriate programming model, ex. using Java Futures in JAXWS / CXF.
  • message correlation across client and server : done at protocol level. HTTP (priority) : a dedicated header put by the consumer probe at emission, read by the provider probe at reception etc. SOAP (later) : could be the standard WSA-Adressing's MessageId
  • correlation between several messages of a same process : custom correlated id has to be put in protocol by service engine (just like message id), and forwarded (or not) according to process perimeter (delimited by process engine, ex. Talend flow => FLOW_ID, Bonita workflow...)

Other questions / to be validated

Service id in EasySOA : depends on the model, probably System(Path)(+Env(Name))+ServiceNameOrURL

Participant : NOT KNOWN IN THE EXCHANGE EVENT though could be inferred from matching its info (Java thread Principal, proxy conf) with EasySOA's, ex. through an (LdapPrincipal)ExchangeParticipantInferrer / in a Jasmine rule.

QoSEvent Interface : key-value map or dedicated class (or both) ?

Message API or Code invocation API, or both (for convenience but also "as close as possible") ?

Monitoring architecture has to be flexible, so that it can be fed by QoSEvents from any service runtime. This is possible by developing a service runtime-specific Jasmine Collector (TODO Probe also ??) collecting QoSEvents from wherever the runtime stores them (TODO or even synchronously putting them in Jasmine's database). Then they are able to use Jasmine rules & EasySOA information to trigger higher-level SOA events.

Talend compatibility

see example http://jira.talendforge.org/browse/TESB-1682

Final model

ExchangeEvent :

  • ...

Clone this wiki locally