add reset_stream to accumulate#249
Conversation
|
(without looking at the code) can you comment on the connection between this and things like zip_latest, where different input streams have different roles and (in your other PR) possibly different representations? I agree that many nodes may have a concept of "reset" which will mean different things in context - often rerunning code that currently lives in |
|
Overall I think incoming edges meaning different things to a node is a good thing. It provides a nice level of flexibility which would be difficult to come by otherwise and would potentially create confusion. For example I think s1 = Stream()
s2 = Stream()
s1.accumulate(operator.add, reset_stream=s2).sink(print)is more readable than s1 = Stream()
s2 = Stream()
s3 = s1.accumulate(operator.add)
s3.sink(print)
s2.sink(lambda x: s3.state=no_default)In the implementation I think everything will need to be stored in
|
|
Certainly agree that your second snippet there is hard to read; in fact, it took me a moment to realise what it was doing at all. I really want to spend some time thinking about how this could be generalised, though. There are a number of streamz types that could be "reset", and some others that might also take a different control input (like emit_on and flush); while some like |
|
I like the idea of having different edge properties for the visualization of control edges (eg dotted or dashed or colored). I need to think about the generalization. Maybe have a super class which has a clear method which can be overridden as needed? I need to think/prototype around this more since I'd like to find corner cases where reset produces odd behavior. @martindurant do you want the generalization in this PR or can it go into a separate one? |
One issue I've run into is resetting the state of accumulate nodes.
The current operating procedure is if there is a signal to clear the accumulate node, that it should send data to a sink which resets the acummulate node's state.
However this causes "spooky action at a distance" where two nodes which are completely unrelated by the graph have a very close relationship, making the structure of the pipeline more difficult to understand (naming credit to tacaswell).
This PR fixes this by providing a dedicated stream via a kwarg.
When this stream reciveds data then the accumulate node's state is reset.
Since this stream is formally tracked by the node
upstreamsit is properly recoreded in the visualized graph and behaves properly when connected or disconnected.