A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://vatlab.github.io/sos-docs/doc/user_guide/language_module.html below:

language_module

Writing a new language module Role of a language modual

SoS can interact with any Jupyter kernel. As shown in the SoS notebook tutorial, SoS can

without knowing what the kernel does.

However, if the kernel supports the concept of variable (not all kernel does), a language module for the kernel would allow SoS to work more efficiently with the kernel. More specifically, SoS can

Whereas data exchange among subkernels is really powerful, it is important to understand that, SoS does not tranfer any variables among kernels, it creates independent homonymous variables of similar types that are native to the destination language. For example, if you have the following two variables

in R and executes a magic

in a SoS cell, SoS actually execute the following statements, in the background, to create variables a and b in Python

These variables are independent so that changing the value of variables a or b in one kernel will not affect another. We also note that a and b are of different types in Python although they are of the same numeric type in R (a is technically speaking an array of size 1).

Define a new language Module

The best way to start a new language module is to read the source code of an existing language module and adapt it to your language. Our github oraganization has a number of language modules. Module sos-r is a good choice and you should try to match the corresponding items with code in kernel.py when going through this tutorial.

To support a new language, you will need to write a Python package that defines a class, say mylanguage, that provides the following class attributes:

supported_kernels

supported_kernels should be a dictionary of language and names of the kernels that the language supports. For example, ir is the name of kernel for language R so this attribute should be defined as:

supported_kernels =  {'R': ['ir']}

If multiple kernels are supported, SoS will look for a kernel with matched name in the order that is specified. This is the case for JavaScript where multiple kernels are available:

supported_kernels =  {'JavaScript': ['ijavascript', 'inodejs']}

Multiple languages can be specified if a language module supports multiple languages. For example, MATLAB and Octave share the same language module

supported_kernels = {'MATLAB': ['imatlab', 'matlab'], 'Octave': ['octave']}

Wildcard characters are allowd in kernel names, which are useful for kernels that contain version numbers:

supported_kernels = {'Julia': ['julia-?.?']}

Finally, if SoS cannot find any kernel that it recognizes, it will look into the language information of the kernelspec.

background_color

background_color should be a name or #XXXXXX value for a color that will be used in the prompt area of cells that are executed by the subkernel. An empty string can be used for using default notebook color. If the language module defines multiple languages, a dictionary {language: color} can be used to specify different colors for supported languages. For example,

background_color = {'MATLAB': '#8ee7f1', 'Octave': '#dff8fb'}

is used for MATLAB and Octave.

cd_command

cd_command is a command to change current working directory, specified with {dir} intepolated with option of magic %cd. For example, the command for R is

cd_command = 'setwd({dir!r})'

where !r quotes the provided dir. Note that { } are used as a Python f-string but no f prefix should be used.

options

A Python dictionary with options that will be passed to the frontend. Currently two options variable_pattern and assignment_pattern are supported. Both options should be regular expressions in JS style.

Both options are optional.

__version__

This attribute, if provided, will be included in the debug message when the language module is loaded. This helps you, for example, to check if the correct version of the language module has been loaded if you have multiple instances of python, sos, and/or language module available.

An instance of the class would be initialized with the sos kernel and the name of the subkernel, which does not have to be one of the supported_kernels (could be self-defined) and should provide the following attributes and functions. Because these attributes are instantiated with kernel name, they can vary (slightly) from kernel to kernel.

String init_statement

init_statements is a statement that will be executed by the sub-kernel when the kernel starts. This statement usually defines a number of utility functions.

Function get_vars(self, names)

should be a Python function that transfer specified Python variables to the subkernel. We will discussion this in detail in the next section.

Function put_vars(self, items, to_kernel=None)

Function put_vars should be a Python function that put one or more variables in the subkernel to SoS or another subkernel. We will discussion this in detail in the next section.

Function expand(self, text, sigil) (new in SoS Notebook 0.20.8)

Function expand should be a Python function that passes text (most likely in Markdown format) with inline expressions, evaluate the expressions in the subkernel and return expanded text. This can be used by the markdown kernel for the execution of inline expressions of, for example, R markdown text.

Function preview(self, item)

Function preview accepts a name, which should be the name of a variable in the subkernel. This function should return a tuple of two items (desc, preview) where

Function sessioninfo(self)

Function sessioninfo should a Python function that returns information of the running kernel, usually including version of the language, the kernel, and currently used packages and their versions. For R, this means a call to sessionInfo() function. The return value of this function can be

The function will be called by the %sessioninfo magic of SoS.

Obtain variable from SoS

The get_vars function should be defined as

def get_vars(self, var_names)

where

This function is responsible for probing the type of Python variable and create a similar object in the subkernel.

For example, to create a Python object b = [1, 2] in R (magic %get), this function could

  1. Obtain a R expression to create this variable (e.g. b <- c(1, 2))
  2. Execute the expression in the subkernel to create variable b in it.

Note that the function get_vars can change the variable name because a valid variable name in Python might not be a valid variable name in another language. The function should give a warning (call self.sos_kernel.warn()) if this happens.

Send variables to other kernels

The put_vars function should be defined as

def put_vars(self, var_names, to_kernel=None)

where

  1. self is the language instance with access to the SoS kernel
  2. var_name is a list of variables that should exist in the subkernel.
  3. to_kernel is the destination kernel to which the variables should be passed.

Depending on destination kernel, this function can:

So basically, a language can start with an implementation of put_vars(to_kernel='sos') and let SoS handle the rest. If needs arise, it can

NOTE: SoS Notebook before version 0.20.5 supports a feature called automatic variable transfer, which automatically transfers variables with names starting with sos between kernels. This feature has been deprecated. (#253).

For example, to send a R object b <- c(1, 2) from subkernel R to SoS (magic %put), this function can

  1. Execute an statement in the subkernel to get the value(s) of variable(s) in some format, for example, a string "{'b': [1, 2]}".
  2. Post-process these varibles to return a dictionary to SoS.

The R sos extension provides a good example to get you started.

NOTE: Unlike other language extension mechanisms in which the python module can get hold of the "engine" of the interpreter (e.g. saspy and matlab's Python extension start the interpreter for direct communication) or have access to lower level API of the language (e.g. rpy2), SoS only have access to the interface of the language and perform all conversions by executing commands in the subkernels and intercepting their response. Consequently,

  1. Data exchange can be slower than other methods.
  2. Data exchange is less dependent on version of the interpreter.
  3. Data exchange can happen between a local and a remote kernel.

Also, although it can be more efficient to save large datasets to disk files and load in another kernel, this method does not work for kernels that do not share the same filesystem. We currently ignore this issue and assume all kernels have access to the same file system.

With access to an instance of SoS kernel, you can call various functions of this kernel. However, the SoS kernel does not provide a stable API yet so you are advised to use only the following functions:

sos_kernel.warn(msg)

This function produces a warning message.

sos_kernel.run_cell(statement, True, False, on_error='msg')

Execute a statement in the current subkernel, with True, False indicating that the execution should be done in the background and no output should be displayed. A message on_error will be displayed if the statement fails to execute.

sos_kernel.get_response(statement, msg_type, name)

This function executes the statement and collects messages send back from the subkernel. Only messages in specified msg_type are kept (e.g. stream, display_data), and name can be one or both of stdout and stderr when stream is specified.

The returned value is a list of

msg_type, msg_data
msg_type, msg_data
...

so

self.sos_kernel.get_response('ls()', ('stream', ), 
                name=('stdout', ))[0][1]

runs a function ls() in the subkernel, collects stdout, and get the content of the first message.

Debugging

If you are having trouble in figuring out what messages have been returned (e.g. display_data and stream can look alike) from subkernels, you can use the %capture magic to show them in the console panel.

You can also define environment variable SOS_DEBUG=MESSAGE (or MESSAGE,KERNEL etc) before starting the notebook server. This will cause SoS to, among other things, log messages processed by the get_response function to ~/.sos/sos_debug.log.

Logging

If you would like to add your own debug messages to the log file, you can

from sos.utils import env

env.log_to_file('VARIABLE', f'Processing {var} of type {var.__class__.__name__}.')

If the log message can be expensive to format, you can check if SOS_DEBUG is defined before logging to the log file:

if 'VARIABLE' in env.config['SOS_DEBUG'] or 'ALL' in env.config['SOS_DEBUG']:
    env.log_to_file('VARIABLE', f'Processing {var} of type {var.__class__.__name__}.')

Although you can test your language module in many ways, it is highly recommended that you adopt a standard set of selenium-based tests that are executed by pytest. To create and run these tests, you should

Test files

The test suite contains three files:

All tests should be derived from NotebookTest derived from sos_notebook.test_utils, and use a pytest fixture notebook as follows:

from sos_notebook.test_utils import NotebookTest

class TestDataExchange(NotebookTest):
    def test_something(self, notebook):
        pass
The notebook fixture

The notebook fixture that is passed to each test function contains a notebook instance that you can operate on. Although there are a large number of functions, you most likely only need to learn two of them for your tests:

  1. notebook.call(statement, kernel, expect_error=False)

This function append a new cell to the end of notebook, insert the specified statement as its content, change the kernel of the cell to kernel, and executes the cell. It automatically dedent statement so you can indent multiple statements and cal

notebook.call('''\
          %put df --to R
          import pandas as pd
          import numpy as np
          arr = np.random.randn(1000)
          arr[::10] = np.nan
          df = pd.DataFrame({'column_{0}'.format(i): arr for i in range(10)})
          ''', kernel='SoS')

This function returns the index of the cell so that you can call notebook.get_cell_output(idx) if needed. If you are supposed to see some warning messages, use expect_error=True. Otherwise the function will raise an exception that fails the test.

  1. notebook.check_output(statement, kernel, expect_error=False, selector=None, attribute=None)

This function calls the notebook.call(statement, kernel) and then notebook.get_cell_output(idx, selector, attribute) to get the output. The output contains all the text of the output, and additional text from non-text elements. For example, selector='img', attribute='src' would return text in <img src="blah"> output. Using this function, most of your unittests can look like the following

def test_sessioninfo(self, notebook):
    assert 'R version' in notebook.check_output(
        '%sessioninfo', kernel="SoS")
Registering the new language module

To register a language module with SoS, you will need to add your module to an entry point under section sos-language. This can be done by adding the something like the following to your setup.py:

entry_points='''
[sos_language]
Perl = sos_perl.kernel:sos_Perl
'''

With the installation of this package, sos would be able to import a class sos_Perl from module sos_perl.kernel, and use it to work with the Perl language.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4