Remote Control for Videoconferencing

Project Personnel: Marcia Perry (MPerry@lbl.gov) and Deb Agarwal (DAAgarwal@lbl.gov)

Lawrence Berkeley National Laboratory
Distributed Systems Department

Project Results

We have designed, implemented, and deployed a camera control system and a conference controller that provide remote control capabilities to videoconferencing over the Internet. The camera control system allows users to pan, tilt, and zoom the cameras and switch between cameras and get a picture-in-picture view from their desktops. The conference controller allows conference participants to not only start and stop the media tools on a remote host, but also to dynamically change settings and turn transmission on and off. It supports the vic (video) and vat (audio) Internet videoconferencing tools and enhances their usability by providing an integrated and secure user interface for local and remote control of these applications. This project provided some of the funding for the design and implementation of the camera control system (devserv and camclnt) and the conference controller (confcntlr). The remote control capabilities offered by these tools have changed the videoconferencing paradigm to one of telepresence. With these tools remote users can "walk" around the room, focus in on things, and feel more like a participant rather than an observer. The devserv, camclnt, and confcntlr software are released and available for download over the web from http://www-itg.lbl.gov/mbone/remote/. The remote camera control tools have been integrated into the CERN VRVS system and other collaborative environments. These tools have provided a platform for the study of telepresence.

Remote Camera Control System

The camera control system consists of a device server (devserv) and a client (camclnt) that allow users to control the video devices. Devserv is run on the host connected to the devices and camclnt is the graphic user interface that can be run anywhere. Participants using camclnt and watching the video can select the camera to view, and they can pan, tilt, or zoom any available camera. They can also create or move a picture-in-picture view if there is a videoswitcher. In addition to providing a means of controlling video devices from the desktop, devserv and camclnt are extensible and cross-platform. Devserv was written in C++ and camclnt was written in Java. The code has been tested on Solaris, FreeBSD, Linux, Irix, and Windows95/98/NT. The server currently supports the Sony EVI-D30, Canon VC-C1, and Canon VC-C3 cameras and the Panasonic WJ-MX50 videoswitcher.

System Design

Client requests are transmitted using UDP unicast and IP multicast connections; the server uses IP multicast to send messages. Servers and clients can execute on the same or different hosts and any number of hosts can join a multicast group. Our system supports both the socket interface and the common communication library developed under the Collaboratory Interoperability Framework (CIF) project[1]. The CIF library provides a simple uniform interface to low-level network protocols providing reliable and unreliable unicast and multicast. CIF has implementations available in both C++ and Java and these implementations interoperate seamlessly.

Requests to move the video devices are sent by the client to a server using UDP unicast. The server then drives the devices (via RS232 serial communication) and sends a description message using IP multicast. Clients can also send a request to the server to send a description message. Commands and descriptions are ASCII strings and are defined in our Remote Camera Control Language[2]. All messages contain a header that includes a timestamp and the name of the server. Descriptions contain status information such as conference information (address, port, etc.) and the state of the connected devices (e.g., type, degrees of freedom, current positions). Commands specify the device, degree of freedom, and the appropriate values. The command set supports absolute, relative, and fractional camera movements. For example, "cam 3 tilt R -20 1" is a command for camera three to tilt minus 20 degrees relative to the current position at the maximum speed, and "cam 1 pan F 0.5 1" is a command to pan camera one by one-half fraction of an image to the right, at maximum speed. New clients may be written to work with devserv by implementing the Remote Camera Control Language.

In order to control access to the devices, we have incorporated the Secure Socket Layer (SSL) to provide a secure connection between the client and the server, and the Akenti authorization system to verify client authorization[19]. Servers and clients are identified by X.509 identity certificates. This information, along with the master-secret generated by the SSL handshake, is cached locally by each host and used to perform an access control check for each request.

Devserv

Devserv's class structures for networking and devices are shown in Figure 1. In the network hierarchy, classes for specific connection types (UDP unicast, IP multicast, and SSL) are derived from a Network abstract base class that encapsulates common socket properties (e.g., socket identifier, open/close, send/receive). For CIF, objects from the CIF library are used. The hardware device classes mirror physical objects. A class for each device is derived from the class that encapsulates the properties of its category (e.g., the Camera or Videoswitcher class) and these base classes are derived from the Device abstract base class which represents attributes that are common to all programmable devices (e.g., move, transmit, receive). At runtime an object is instantiated for each device that is connected. Classes for new devices can be easily derived from the existing classes. For example, a new videoswitcher could be derived from the Videoswitcher class. The network class structure can be extended similarly.

Figure 1: UML diagram of the devserv class structure

Upon startup, devserv determines the hardware and network configuration for the host from a configuration file and then opens the serial port and network connections and initializes the devices according to the configuration. It then receives and processes requests until program termination. For security, the Akenti software provides authorization for each identity and helps make access control decisions for requests. If security is enabled, an SSL connection is established to exchange and validate identity certificates with the server for control authorization. The shared secret that is generated by this handshake (the master-secret) is cached and it is used by devserv to make access control decisions. Devserv multicasts descriptions periodically or after carrying out a request. Requests are sequenced by their timestamps.

The devserv program is threaded; a thread is created for each device and network connection. The device threads initialize, run, and shut down the devices. One thread is created for each type of network connection. These threads process incoming requests while the main thread sends descriptions. Thread synchronization uses "wait-and-signal" mechanisms and serial port synchronization is by locking objects.

Camclnt

Camclnt uses the Java I/O and networking packages to implement communication. At startup, camclnt opens its network connections and then multicasts a request for a description message. The description messages allow camclnt to discover the addresses of hosts running servers and what devices are connected to each server. This information is cached for later retrieval. Camclnt's graphical user interface displays a list of the servers discovered. When a server is selected, the window is reconfigured to show the information for that server.

Camclnt's graphical user interface is based on Java's Abstract Window Toolkit (AWT). The Java Media Framework (JMF) tools are incorporated to display the video. The main window contains the controls for selecting a server, selecting cameras to view and move, and specifying device commands. Commands are sent to the server when the user selects a device, clicks in the pan-and-tilt area, or manipulates a zoom control. Fractional moves and a picture-in-picture view are created by clicking and dragging in the camera view area or the JMF camera image. The pan/tilt area allows the user access to the entire pan and tilt range of the camera. Figure 2 shows the main window on the right and the JMF player window on the left.

Figure 2: Camclnt Main Window and JMF Video Player Window

Because servers can send descriptions at any time, camclnt uses a main thread to respond to user-triggered events and a separate thread for receiving messages from the server. When a description arrives, the receiving thread updates the configuration information for the server and reconfigures the main window if necessary. Access to shared objects is synchronized with Java synchronization mechanisms.

Confcntlr

Confcntlr was developed to control media tools locally and remotely from a unified interface[13]. It supports the following actions:

Based on a peer-to-peer architecture, confcntlr is meant to execute on each host participating in a multimedia conference. Confcntlr supports encryption to preserve confidentiality and integrity of data exchanged between confcntlrs over the public Internet. Security features also allow a user to restrict access to the local confcntlr. The security operations are to:

Design

The conference controller was designed as a desktop application that permits access to itself and the tools it controls. It is single-threaded and handles all of its events within one event loop which utilizes a FIFO event queue. Communication between confcntlrs running on different hosts is via TCP connections while communication between applications running on the same host is via the multicast "conference bus." Figure 3 depicts this communication architecture.

pened for specific categories of functions (e.g., local or remo te settings, security features, and general conference control). Menus with curr ent values are presented along with fields for users to type new values. The mai n window is shown in Figure 5. The status checkbuttons indicate what media tools are running locally and remotely. The image in the lower left displays t he remote access level; red, yellow, and green correspond to "allow no one," "al low authorized users," and "allow anyone," respectively.

Figure 3: Confcntlr Communication Architecture

 In order to allow an arbitrary number of remote controllers, a separate TCP connection is used for each request; all other connections are opened once and remain open until program termination. Each conference controller controls one conference session at one site but multiple confcntlrs can be executed to allow users to participate in multiple conferences. Confcntlr can be launched from the session directory tool sdr.

Implementation

The conference controller was written in Tcl/Tk and C and runs under Solaris, freeBSD, and Irix. It has been designed as four separate units that work together: a graphical user interface (GUI), a control, a network unit, and an encryption unit. Figure 4 shows the relationships of these units.

Figure 4: Confcntlr's Main Components

All actions go through the control unit. When a user manipulates a GUI control to invoke a local or remote operation, the control unit processes the request. To send a message to a remote host or to a videoconference tool on the same host, the control unit invokes the network component. The network unit invokes the encryption unit when messages are sent to or received from a remote host. When the network unit receives a message (from a remote host or local media tools), the control unit processes the message and invokes the GUI to display the output. The control unit carries out local operations invoked by the GUI or network unit. To start or stop a media tool on the local host, confcntlr spawns or terminates a process for the tool. To change local settings for a tool that is executing, the control unit formats a message and the network unit sends the message to the target tool. When a user invokes a remote operation, the control unit formats a message and the network unit sends it to the other confcntlr. The control unit also formats replies to requests received from other hosts and processes incoming replies.

The network unit is responsible for establishing and closing socket connections and transmitting and receiving all host-to-host and interprocess communication. All connections are nonblocking and a file handler is created for each socket. Requests and replies are usually not sequential and replies may not always be sent. There are two host-to-host communication schemes.

  1. Host A sends a request to host B. Host A does not wait for a reply unless it is obtaining a remote host's settings. Host B receives the request, processes it, and sends a reply.
  2. Host A notifies host B that some event took place (e.g., host A terminated a tool). Host B receives and processes the notification but does not send a reply.

For interprocess communication, each process connected to a multicast channel receives a copy of all messages sent over the channel. If a process recognizes the message type, it processes the message locally and may also forward the message to another conference controller at a remote site. Confcntlr sends messages on the conference bus to dynamically change settings on the media tools and processes notifications received from the media tools that a setting was changed or that it is being terminated.

The encryption unit invokes functions from the SSLeay library[7] to encrypt plain text and decrypt ciphertext with the Data Encryption Standard (DES). The user provides the encryption key and decryption tries all the user defined keys, until the decryption succeeds or all keys have been tried.

The graphical user interface of confcntlr is designed to be unobtrusive so that it can run continuously on the user's desktop. It presents a main window with controls for basic operations and a display of status information. Popup windows can be opened for specific categories of functions (e.g., local or remote settings, security features, and general conference control). Menus with current values are presented along with fields for users to type new values. The main window is shown in Figure 5. The status checkbuttons indicate what media tools are running locally and remotely. The image in the lower left displays the remote access level; red, yellow, and green correspond to "allow no one," "allow authorized users," and "allow anyone," respectively.

Figure 5: Confcntlr's Main Window

The conference window, shown in Figure 6, is used for setting media tool addresses and launching the media tools. The window for changing the settings of remote media tools is shown in Figure 7; the "Local Settings" window is similar. Figure 8 shows the security window that contains access control, prompts, and warnings options, an access control list, and a "View key" button to invoke an encryption window.

Figure 6: Conference Window


                Figure 7: Remote Settings Menu                                                         Figure 8: Security Window

 

Background

The implementation of IP multicast over the Internet has inspired videoconferencing tools for video, audio, session directory, conference management, and shared workspace applications. These tools are built as standalone applications and integrated videoconferencing systems and are intended for use by people sitting directly at the computer terminals participating in the videoconference. This is appropriate for desktop conferencing; but in some situations there may not be anyone available at a site participating in the conference. In the case of collaboratories, our experience has been that the researchers present at an experiment site do not want to "tend" the videoconferencing tools for the remote collaborators. This is also the case for participants of a meeting in a conference room. But if no one is at the sending host to execute a video tool, or does not turn on transmission, remote users have no way of receiving an image. Also, if the person watching the video wishes to move a remote camera or change the remote settings and is unable to do so, it is frustrating to the remote participant. Computer control of videoconferencing devices can also provide a non-disruptive means of moving cameras and improving audio quality locally.

As part of the Distributed Collaboratories project of the Imaging and Distributed Collaborations Group at Lawrence Berkeley Laboratory, we have designed, implemented, and deployed a camera control system and a conference controller. These tools give the remote user a sense of telepresence by providing remote control capabilities for videoconferences over the Internet. With the remote camera control and conference controller, collaborators can "walk" around a remote room, focusing in on what is taking place. This capability allows users to feel more like participants than merely observers. The camera control system consists of a server (devserv) and a client (camclnt) used to drive serial-controllable video devices. Devserv is run on the machine directly connected via serial connections to the camera system. Camclnt is the user interface that can be run remotely or locally to control the cameras. Through the camclnt interface the user can control camera pan, tilt, zoom, and picture-in-picture. The devserv and camclnt programs communicate via IP multicast and UDP unicast.

The conference control tool, confcntlr, enhances the usability of the media tools vic (video) and vat (audio) by providing an integrated and significantly enhanced user interface to these tools. Confcntlr allows conference participants from local and remote sites to change media tool settings. Confcntlr is based on a peer-to-peer architecture and it uses TCP connections to exchange messages over the Internet and IP multicast for communication with the media tools on the local host.

Related Work

Many of the early public domain, IP multicast-based videoconferencing tools were single media standalone applications such as the sdr session directory[4], and the Lawrence Berkeley National Laboratory mbone tools--vic and vat for video and audio and wb, a shared whiteboard[5]. Later development involved enhancing existing tools and building integrated systems. For example, the University College London has added to vic support for new video cards and has developed the Robust Audio Tool (rat), which offers improved audio quality[12]. Although vic, vat, wb, and rat are independent applications, they all use a local multicast-based "conference bus" or "message bus" for interprocess communication.

An early conference management tool, the MultiMedia Conference Control program (MMCC), provides an integrated user interface to media tools, and offers session creation and invitation capabilities. Its "autopilot mode" allows users to accept invitations automatically, so the media tools can be started from a remote host, but settings cannot be changed remotely while the tools are executing[15]. More recent integrated conference management systems include the Multimedia Internet Terminal (MINT)[17], mStar[9,18], the MASH project[8], and the CORE2000 Collaboration Environment[16]. These environments provide cross-platform, integrated user interfaces for establishing, joining, and controlling multimedia conferences from the desktop. Although features vary from system to system, they support a wide range of collaborative capabilities that include invitation, voting, floor control, chat, media archiving and playback, resource reservation, and conferencing via a web browser. MASH has added to vic and vat an awareness interface for sending cues (e.g., intention to speak, inability to hear, and other similar feedback)[20] and has implemented the collaborator application, which provides an integrated user interface to the media tools[14].

Some of these systems support limited remote control and provide a platform for developing new applications and integrating them into the environment. mStar is a commercial product and includes a controller that drives cameras and a videoswitcher. Its development framework defines mechanisms for remotely controlling tools and parameters (e.g., stopping tools or changing bandwidth or frame rate from a remote host)[10]. CORE2000 provides remote startup and termination of applications and a camera controller for pan, tilt, and zoom. When a user starts a tool, it is automatically launched on all participants' hosts and, when a user terminates a tool, he or she is asked whether to stop the tool for everyone. Tool settings cannot be changed from a remote host once the tool is executing. CORE2000 supports third-party applications (e.g., CuSeeMe, Televiewer) and provides a framework for porting new tools to its Java environment[11].

MASH has implemented a remote-controlled version of vic (rvic) to allow conference participants in one room (without a technician) to manipulate a shared video display. An rvic server displays a set of windows representing the various video sources and supports different window layouts. With rvic clients, users can manually or automatically select a window layout and video sources displayed in the windows. Clients can be run on wirelessly-connected portable devices[6]. In addition to applications, MASH provides a network and media toolkit from which new applications can be built, and this toolkit includes agents for driving some serial devices. MASH is a research tool and so the priority is not on robustness or completeness of the tools.

 

Summary, Conclusions, and Future Work

This project focused on the design and implementation of a camera control system and a conference controller that together provide remote control capabilities to multimedia conferencing. These applications allow users to control the video devices and media tools used in a videoconference from anywhere over the Internet. With these tools, the person who is watching the transmission and cares most about how it is received can control the transmission. This new remote control capability has changed the videoconferencing paradigm to one of telepresence in which remote users become active participants rather than passive observers. The camera control system and conference controller have served an important role in videoconferencing at Lawrence Berkeley National Laboratory.

The conference controller and camera control system have been used in large conference rooms and by the Spectro-Microscopy Collaboratory[3] for videoconferencing. With these remote control tools, participants at the local site can spend their time on scientific experimentation or attending the meeting rather than 'babysitting' the videoconference tools. By presenting a unified interface to the media tools, confcntlr has made it easier to manage these separate tools. Confcntlr has also been used to remotely instruct a user in operating the media tools. Access control mechanisms allow protection of computer and network resources and have reduced concerns about being watched or heard without a user's knowledge and consent.

Our experience has been that the usability of multimedia conferencing tools is enhanced when there is a unified, intuitive, and configurable interface. We have also learned that, while multimedia tools are important for telepresence, collaborators also wish to meet with each other on a lower level basis as through instant messaging and document or application sharing. Our next goal will be to provide a persistent space that offers a collaboration environment in which participants can locate each other, use asynchronous and synchronous messaging, share documents and applications, and hold videoconferences. We plan to include remote control capabilities for videoconferencing within this environment. More information about the project is available at http://www-itg.lbl.gov and http://www-itg.lbl.gov/mbone.

 

References

  1. Agarwal, D., Schabert, P., Narasimhan, N., Berket, K., Foster, I., and Tuecke, S., The Collaboratory Interoperability Framework Common Application Programming Interface.
  2. Agarwal, D., and Perry, M., "Camera Remote Control Command Language"
  3. Agarwal, D., Johnston, W. E., and Perry, M., "The Spectro-Microscopy Collaboratory at the ALS," http://www-itg.lbl.gov/Collaboratories/ALS.html, LBNL Report #37331.
  4. Clarke, L., and Sasse, A., "Conceptual Design Reconsidered -- The Case of the Internet Session Directory Tool." Proceedings of HCI'97, Bristol, UK, August 1997.
  5. McCanne, S., and Jacobson, V., "vic: A Flexible Framework for Packet Video." ACM Multimedia, November 1995, pp 1-19.
  6. Hodes, T., Newman, M., McCanne, S., Katz, R., and Landay, J., "Shared remote control of a video conferencing application: motivation, design, and implementation." Proceedings of SPIE Multimedia Computing and Networking, San Jose, CA, USA, January 1999, pp. 17-28.
  7. Hudson, T. J., and Young, E., "SSLeay and SSLapps FAQ," http://psych.psy.uq.edu.au/~ftp/ Crypto.
  8. McCanne, S., Brewer, E., Katz, R., Rowe, L., Amir, E., Chawathe, Y., Coopersmith, A., Mayer-Patel, K., Raman, S., Schuett, A., Simpson, D., Swan, A., Tung, T., and Wu, D., "Toward a Common Infrastructure for Multimedia-Networking Middleware", Proceedings of the 7th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV'97), May 1997.
  9. Parnes, P., The mStar Environment - Scalable Distributed Teamwork using IP Multicast. Licentiate of Engineering Thesis, Lulea University of Technology, September 1997.
  10. Parnes, P., Synnes, K., Schefstrom, D., "A Framework for Management and Control of Distributed Applications using Agents and IP-multicast." Proceedings of the 18th IEEE INFOCOM Conference (INFOCOM'99), 1999.
  11. Payne, D. A., and Myers, J. D., "The EMSL Collaborative Research Environment (CORE) - Collaboration via the World Wide Web." Presented at the IEEE Fifth Workshops on Enabling Technology: Infrastructure for Collaborative Enterprises (WET ICE'96), June 19-21, 1996, Stanford, California.
  12. Perkins, C., Hardman, V., Kouvelas, I., and Sasse, M. A., "Multicast Audio: The Next Generation." Proceedings of INET'97, June 1997, Kuala Lumpur, Malaysia.
  13. Perry, M., Confcntlr: A Videoconference Controller: Masters' Thesis, San Francisco State University and Lawrence Berkeley National Laboratory, Publication Number LBNL-41154, December 1997.
  14. Romer, C. R., "A Composable Architecture for Scripting Multimedia Network Applications." Masters' Report, University of California, Berkeley, July 1998.
  15. Schooler, E., "Case Study: Multimedia Conference Control in a Packet-switched Teleconferencing System." Journal of Internetworking: Research and Experience, Volume 4, Number 2, June 1993, pp 99-120.
  16. Schur, A., Keating, K. A., Payne, D. A., Valdez, T., Yates, K., and Myers, J. D., "Collaborative Suites for Experiment-Oriented Scientific Research." Interactions, Volume 3, May/June, 1998.
  17. Sisalem, D., and Schulzrinne, H., "The Multimedia Internet Terminal." Journal on Telecommunication Systems, Volume 9, Number 3, 1998, pp 423-444.
  18. Synnes, K., Lachapelle, S., Parnes, P., and Schefstrom, D., "Distributed Education using the mStar Environment." Journal of Universal Computer Science, Volume 4, Number 10, October 1998, pp 807-823.
  19. Thompson, M., Johnston, W., Mudumbai, S., Hoo, G., Jackson, K., Essiari, A., "Certificate-based Access Control for Widely Distributed Resources." Proceedings of the Eighth USENIX Security Symposium (Security'99), Washington, D.C., August 23-26, 1999, pp 215-227.
  20. Wong, T., and The MASH Research Group, "Inexpensive Techniques for Enhancing Awareness of Participants in Internet Conferencing Tools", Demonstration in the Proceedings of ACM Multimedia '98, Bristol, UK, September 1998, http://www-mash.cs.berkeley.edu/mash/ pubs/index.html.