Script identification thesis

Figure 9. Bro script variable translation to jVectorMap overlay data Figure Host-based scripts running at lowest priority Host-based command usage and operating system compatibility Identified communication paths from ICS network USB insertion script output on test system Host characteristics extracted from command-line utilities Summary of host characteristics extracted from network data ICS protocol frequency analysis in dataset Three command-line tool queries with mutually-exclusive results Content distribution of initial ICS-specific whitelist My sincere gratitude goes to Dr.

Irvine, Dr. Davis, and the entire Cyber Academic Group at the Naval Postgraduate School for their guidance, encouragement, and unrelenting pressure that made this thesis possible. To my advisors—thank you for your patience and for spending the time to ensure that I succeeded. To my wife and family—thank you for your unwavering support and daily motivation through all the late nights and weekends I spent in front of my computer… see—I really was working on something!

Throughout history, malicious attackers have targeted critical infrastructure assets that provide nations with essential services. Around BC, Solon of Athens contaminated the water supply for the town of Cirrha, allowing Athens to swiftly conquer the town while the Cirrhaens were violently ill [1]. In both cases, a single malicious actor obtained unauthorized access to critical infrastructure assets and, prior to detection, caused severe physical consequences.

Now critical infrastructure systems are more susceptible to malicious attacks than ever before. These automation systems are ubiquitous and heterogeneous; they were designed with much implicit trust and are not very compatible with most modern security solutions. Insecurity of these devices can lead to severe consequences and malicious hacking tools can be effective without much difficulty to an attacker. To successfully leverage these tools, an adversary must access the trusted network, leaving specific forensic artifacts and indicators of ICS malicious activity.

An ICS cyber incident response process is not well developed and we lack tools built specifically for identifying current or historical adversary presence within the critical systems domain. Few published efforts reveal actionable technical solutions for ICS security practitioners and none focus on reliably identifying malicious, persistent access within live data from production ICS devices.

Previous studies have outlined the need for forensic collection abilities within the ICS environment. ICS-CERT provides a high-level strategy for forensics, makes the case for having an incident response capability, and presents a breakdown of what is different about ICS systems. However, the ICS-CERT best practice documents only recommend that the capability should be created but the papers do not offer specific tools and techniques to implement incident response.

Critical infrastructure owners and operators are poorly equipped to discover and respond to intrusions into ICS networks. Effective, compatible tools do not exist to reliably extract the necessary technical data to analyze ICS environments with modern incident response techniques [7]. Adversaries construct surreptitious pathways to their target systems that provide persistent access for reconnaissance and future malicious activity. These adversary persistence mechanisms often remain undetected for significant periods of time. Security teams require a repeatable, tailored response methodology employing host and network data collection and analysis techniques to identify malicious pathways and adversary presence on ICS networks.

The methodology and supplementary toolkit within this thesis represent a step toward addressing that need. This thesis proposes a structured methodology to identify malicious activity by using host-based forensics and network analysis to identify anomalous client-side attack vectors during ICS assessments and incident response. This thesis will cover analysis of control systems that respond to command-line forensic interrogation techniques and that communicate over accepted, interpretable ICS protocols.

It will not cover embedded device firmware extraction or offline drive image analysis within the host-based methodology. This thesis also does not cover response actions should adversary presence be detected, nor disaster recovery, although specific case-by-case suggestions will be made. To develop the most robust and reliable methodology, real data from ICS networks will be used in this study. Some experimentation may be conducted through the course of regular assessments and incident response with critical infrastructure asset-owner approval.

Multiple regulations protect disclosure of this data; anonymized data will be used when possible, to include host-based artifacts and scrubbed ICS network traffic. Additional experiments will be conducted within a closed network that replicates real ICS architectures. Chapter III contains technical case studies of relevant cyber incidents. Chapter IV describes a methodology for identifying malicious activity. Chapter V presents the experiments and findings of this study.

Chapter VI provides a summary and recommends areas for future work. These vital systems are widely implemented across all infrastructure sectors to automate large industrial processes. Components of ICS networks are designed to allow embedded logic to control a process efficiently without constant human intervention, and thus these components have specific roles and constraints to operate as low-level building blocks for industrial automation. Control systems were originally designed to monitor and control industrial processes in complete isolation, much like server mainframes were initially configured.

One central master unit provided all computing, control, and monitoring functionality, quietly executing simple instructions or ladder logic [9]. If the secondary system detected a fault, it commandeered all operations. Vendors continued to follow this mainframe architecture configuration for ICS systems through the s and continuing into the early s, installing unique systems with proprietary protocols as automation technology matured. As personal computer PC costs became less prohibitive and Local Area Network LAN technology was embraced for business networks in the late s, individual computers began to replace static ICS components which enabled distributed functionality and processing across multiple control systems [9].

Data historians and human-machine interfaces HMIs were implemented not as standalone systems but as vendor-specific proprietary software on specialized personal computers. HMIs still provide the graphical user interface GUI front-end for operators today and continue to be one of the few human-to-machine touchpoints in the largely automated control environment. These computer-based processes ran modified versions of common operating systems and were generally vendor-provided systems, with only the vendor able to provide upgrades and system maintenance.

As a result of the newly distributed architecture, the ICS domain no longer required full-time redundant failover systems because the PC-based ICS systems could share the operations load of failed components. In the late s, vendors started to embrace commercial off-the-shelf computer systems and networking hardware. The last section of this chapter examines ICS protocols more comprehensively.

Vendors used old or modified versions of Windows that, once initially tested for compatibility with vendor equipment, were rarely updated. These newly networked and communicating field devices, like remote terminal units RTUs and programmable logic controllers PLCs , allowed for customized implementations. RTUs are field devices that transmit telemetry data information that is collected at remote or inaccessible points to master systems. RTUs also accept commands from those master systems to control connected objects. With the introduction of these field devices, the ICS compo nent manufacturers enabled the system-integrators and critical infrastructure companies to make their own modifications and leverage their existing network infrastructure.

Companies have continued to develop networked ICS architectures that allow for streamlined performance tracking, accurate billing, and off-site backup capabilities, despite the growing security concerns that this interconnection has introduced. This fully distributed and networked architecture is necessary for industrial applications that require distributed monitoring points, such as electric power transmission and distribution, oil and gas pipeline and production operations, and water utility operations.

SCADA systems continue to manage very large-scale processes at multiple sites and over large distances. HMIs are the devices which present process data to their human operators, who control and monitor the processes. RTUs interconnect all the sensors in the process and also convert those sensor signals to digital data, which is routed to the SCADA system. PLCs are field devices that are cheap, flexible, and highly-configurable. When combined within an industrial control network, these devices retain features of the individual system, leading to unique, expensive configurations that have the functionality and weaknesses of the legacy equipment.

ICS components can be interconnected with a variety of mediums. Direct-wiring is common for intra-facility connections. Microwave communications backbones are very frequently used between facilities, especially in oil and natural gas pipelines, but fiber optic inter-facility deployments are becoming more common. Both spread-spectrum and narrow-band radio are used to connect remote ICS components, however dial-up or cellular modems continue to provide connection to these systems. As explained above, there are many components and countless possible configurations. Attempting to broadly capture several possible ICS network architectures produces a complex diagram, such as Figure 1.

While this is helpful for understanding the complexities of these systems, a more abstracted model can be used to encapsulate the myriad of components and configurations. Researchers introduced an abstracted ICS architecture, shown in Figure 2, which identifies the key functions within the ICS domain [10]. At the top right of Figure 2, the traditional IT network, which hosts the corporate network and controls site manufacturing operations, is interconnected to the rest of the ICS network by one of several domain-interconnection methods discussed in the next subsection.

In addition to the distribution and communication paths between ICS components, there are often additional network interconnections between the traditionally isolated ICS domain and the corporate network. These interconnections are introduced for a variety of reasons. The ICS network contains valuable information for business applications, such as billing and financial data, equipment trending, and operational reports. Interconnections to external networks may have been introduced to allow for remote vendor support or, in the case of the inter-control center communications protocol ICCP , for utilities to share electrical power status for grid stability [14].

Remote access technology allows for access to corporate software from field locations and provides capability to manage devices that are difficult to access. Dedicated lines are expensive, while the Internet is cheap and pervasive. Low cost and easy-to-install-and-maintain wireless connectivity has been added to field devices, allowing for the bypass of the physical security boundary and direct connection of field devices to the Internet.

ICS inter-domain networking can be architected using a variety of methods, such as explicit direct connections, firewall-controlled connections, demilitarized zone DMZ only connections, or data diodes. Direct connections can be hard to trace as they are constructed by employees or vendors uniquely for the site, usually with standard protocols such as secure shell SSH , Telnet, file transfer protocol FTP , virtual private network VPN , or with dial-up connections.

This means that the direct connections may not be known to the security team. Firewalls and access control lists ACLs are often used for this interconnection and only allow certain types of connections; but the software for them provides only limited support for ICS protocols. DMZs are common with these interconnections; however they should not be constructed to have access to the Internet like traditional web server DMZs.

When properly implemented, isolated networks can communicate with the DMZ but not with each other, making the DMZ good for storage servers such as databases and data historians. Data diodes enforce unidirectional network flow, making them ideal for the ICS environment, but they do not allow for acknowledgement or response and thus are not fully compatible with the transmission control protocol TCP.

This means that data diodes, although difficult to implement with many protocols, can address a limited set of inter-domain communication needs. Regardless of interconnection type, a single control such as a DMZ, firewall, or data diode cannot provide sufficient defense alone. Since there are no guaranteed defenses for business networks and almost all are interconnected with ICS networks in some way, business network incidents can intentionally or collaterally affect ICS networks.

Once on the Austrian ICS network, the message generated thousands of reply messages and flooded the control network. Thus, it is clear that not only the interconnection medium must be understood but also the communication protocols used by the ICS devices. Since the development of ICS technology was vendor-driven with a wide variety of competing hardware, software, and capabilities, the communication technology was similarly developed in a loose, ad-hoc fashion and lacked a central standards body.

The resulting list of communication protocols include Modbus and the distributed network protocol revision 3 DNP3 as well as hundreds of proprietary protocols like ANSI X3. ICS protocols have multiple telemetry schemes such as reporting by exception which is common in Europe , in round-robin communications, or at a time polling interval. Most of these protocols are primitive and field devices cannot be reliably queried to see what protocols they support.

All these factors result in a highly-complex forensic and incident response process. Despite their differences, every ICS protocol functions using master-slave communication. The master polls for data, controls slave devices, and maintains a repository of data. The slaves, transmitting either by polling or reporting by exception, respond to master commands. Although all components in the ICS domain are either a master or a slave, it is important to note that slaves can have more than one master, and a device can be a master in one environment and a slave in another as in a tiered architecture.

This tiered structure, with a master at a remote site gathering all data for transmission to the next hop, saves bandwidth and reduces the poll cycle. As explained in the previous section, ICS design allows operators to interact with the control systems at an abstract level, significantly lowering the technical expertise and knowledge of the system required to keep the process running.

The resulting state of ICS network security is that field devices are low capability, designed for performance rather than security, and operate on a large number of unique protocols that must all be protected equally [17]. ICS systems were built to be highly available machines and integrity and confidentiality were afterthoughts. That is, ICS hardware was never designed with security in mind because they were originally deployed in isolated environments, removed from external networks. As those systems have become increasingly connected to the network across much of critical infrastructure, ICS systems are more accessible and susceptible to malicious attacks.

The same control systems that replaced so many human functions, such as flipping a switch or turning a knob based on automated thresholds, also introduced a range of security concerns. The major vulnerabilities for ICS are in three areas: the prevalence of legacy equipment, the requirement for real-time availability, and the patching difficulties associated with partially or fully segmented networks.

The product life cycle for ICS equipment is considerably longer than traditional information technology IT systems. Vendor support is limited compared to traditional computers with few support styles and often a single vendor supporting many systems. Forensic capabilities are lacking on most ICS equipment since field devices do not generally store logs and their alarms and responses can be suppressed.

To communicate in these mixed environments, with both legacy and modern equipment, proprietary ICS protocols were modified to be used in IP-based networks. With prevalent cleartext communications and limited authentication or validation, ICS networks provide ample opportunity for tampering, interception, and injection of data. This could enable injected telemetry data, adversary knowledge of system events, enumeration of equipment, or loss of market-relevant information.

From a security perspective, most ICS traffic is accepted if addresses match and cyclic redundancy checks CRCs validate. Modern solutions like encryption are often not implemented and frequently are not supported within legacy ICS protocols. In fact, the popular Modbus and DNP3 protocols currently do not support authentication, integrity checking, authorization, or encryption. Third-party security solutions may not be applicable since the components were designed to support the intended industrial process and the components may not have enough computing resources or memory to support the addition of security capabilities.

The later that security is considered in the development of many devices, the more difficult and expensive its security can become. An ICS network has minimal tolerance for communication delay or data loss. The environment is expected to be available for extended periods of time and to meet strict timing requirements [14].

Whereas traditional IT systems are built around confidentiality, integrity, and availability CIA triad , in that order, control systems are the opposite. They were designed with the requirement that real-time availability comes first, then the integrity of the telemetry data, and lastly confidentiality. Real-time availability requirements were historically addressed with redundancy [25] but full-time redundant backups are no longer the industry standard. With the same uptime requirements but without fully redundant backups, simple IT functions like rebooting may not be acceptable due to process availability requirements.

The throughput and uptime requirements of these networks create an environment where continuous reliable operations will always trump security assessments, forensics, and incident response. ICS components tend to be inadequately patched compared to IT systems. The security afforded by any amount of ICS network segmentation is exchanged for the ability to easily manage patches, antivirus definitions, and firmware updates.

Patch management is essential to update or repair components of systems that have identified vulnerabilities affecting the validity and integrity of device operation. This is compounded by the lack of vendor support and the difficulty of upgrading within the highly volatile and sometimes unstable environment in which control systems reside.

Months of planning is often required in order to take offline and apply patches to production ICS systems. Critical infrastructure asset operators must weigh the planning difficulties and costs of patching ICS systems for stable and accurate operations against the possibility of malicious tampering with that component. Furthermore, even if patches are scheduled and applied properly, they can introduce instability into the ICS domain if not thoroughly tested. These vulnerabilities, unlike zero-day exploits where vendors and security firms can deploy patches, exist on the legacy equipment described above and can be targeted specifically with the knowledge that they are weaknesses on older control systems that are not continually supported.

These systems directly control the processes that operate and stabilize our critical infrastructure, including everything from dams and water treatment plants to electric power utilities and nuclear generation plants. The failure of control systems can directly manifest in the physical world with effects ranging from regulatory non-compliance to severe physical disasters. In the electricity sector, instability can lead to cascading outages and loss of communication, especially power systems with dynamic and dramatic changes in load.

In that scenario, there is a complex, multi-step restoration process. First, critical loads like hospitals must have power restored, then generation facilities, transmission lines, distribution feeders, and lastly corporate and residential power service. In the oil and natural gas industry, system failure can result in fuel shortages with economic impacts and even explosions, causing loss of human life or environmental catastrophes. In the water sector, improper treatment or contamination of water, release of waste, or reduced pressure to emergency systems like fire hydrants are possible consequences.

Chemical ICS systems can be tampered with resulting in chemical formula manipulation. Even the transportation sector relies heavily on ICS networks for control of mass transit, and their failure could lead to train derailments or crashes. The hardware and software for ICS components is almost exclusively foreign-owned [29] and there are a limited number of critical asset manufacturers, with long lead times required for high-value components. For example, it can take six months to receive a replacement for certain transformers.

The failures of these systems can be very real. Had the supervisory ICS components remained responsive to the commands of the Olympic controllers, the controller operating the accident would have been able to initiate actions that would have prevented the pressure increase that ruptured the pipeline. In August , the northeastern region of the United States and Canada experienced a blackout that affected 50 million people due to cascading failures of electrical grid operation. The nuclear plant shutdown took 48 hours to recover.

To identify malicious activity in the ICS domain, it is necessary to review case studies of previous intrusions. Due to the limited availability of public data on past ICS incidents, this thesis categorizes similar events together to isolate behavioral trends. Past malicious activity related to control systems can be assigned to one of three categories: custom targeted exploits, commodity malware in the control environment, and unauthorized persistent ICS network access.

This category of malicious activity includes nation-state-sponsored hacking or other highly-targeted and well-funded intrusions, often using custom exploits. These attacks have targeted PLCs and other ICS field devices and can involve firmware tampering or exploiting other device vulnerabilities. Targeted attacks such as those sponsored by nation states tend to use zero-day exploits that have not been publicly disclosed or patched by the vendor. Historical examples of targeted critical infrastructure attacks include the proof-of-concept Aurora exploit, Shamoon, and most notably Stuxnet.

In early , a marine terminal in Venezuela was the target of attempted sabotage. The technical details have not been shared publicly, but it is believed that a team of hackers obtained access to the SCADA network of the oil tanker loading machinery and overwrote PLCs with an empty program module [16]. The result was an eight hour halt of the oil tanker loading process until the backup ladder logic was reinstalled on the PLCs. This electrical attack could be conducted locally or remotely using unauthorized access to conduct man-in-the-middle or address resolution protocol ARP cache poisoning to inject breaker trip commands.

In June , a piece of malware was uncovered by researchers from Belorussian security firm VirusBlokAda that would eventually become known as the Stuxnet worm. Stuxnet exploited four zero-day vulnerabilities and used two compromised digital certificates. The malware did not target the PLCs and field devices directly; it exploited high-level application software on the controlling Windows-based systems with the ability to program PLCs [4].

Researchers have concluded that Stuxnet code remained idle for an average of The rootkit conducts denial of service attacks and overwrites flash memory to make the attack persistent through device reboots. Malicious activity in the ICS domain can include the intentional or accidental use or re-purposing of known malicious software such as crimeware or banking Trojans. Even though these exploits and malicious software were not written to target ICS systems, commodity malware from traditional systems introduces instability into an already volatile ICS domain, with unknown effects on the operation cyber-physical systems.

Because antivirus signatures exist for much broad-audience malware, the detection rate is high and there is more publicly available technical information than for the other categories. Often, regulated critical infrastructure entities must report these incidents and thus more are disclosed. We describe some representative examples.

The worm infected an unpatched server in the ICS network and caused enough network congestion to shut down the Safety Parameter Display System for nearly five hours. Reporting indicates that the Slammer worm also impacted other electricity sector systems [45]. This commodity malware installed applications and created backdoors while continuing to spread by infecting e-mail attachments. Although not specifically designed for or targeting the railway systems, the worm propagated into the control center causing loss of signaling, dispatch, and other related systems.

The end result was multiple train delays and expensive clean-up costs. In August , thirteen United States automobile manufacturing plants operated by Daimler Chrysler were shut down when the Zotob worm was introduced into the control network. The intrusion was discovered prior to damage occurring and thus the systems remained stable and operating. In August , viruses intended to steal passwords and send them to a remote server jumped a significant physical air gap and infected laptops inside the International Space Station.

Although the impacts were minimal, the virus did make it onto more than one laptop, suggesting that was spread using internal networking or portable media. Several machines were likely affected by the incident, including two Windows-based engineering workstations, both critical to the operation of the control environment. This incident was caused when a technician used a USB drive to upload software updates during a scheduled equipment upgrade outage. Quarantine and restoration procedures at the company resulted in downtime for the impacted systems and a delay of the plant restart for three weeks [48].

RAT stands for remote administration tool and also remote access Trojan. Analysis of the incident has revealed that the webshell had been in place since and the modified Gh0st RAT malware was compiled in , indicating that the adversary was patient and planned out the intrusion. The webshell redirected users at the Monju plant to another web server where a self-extracting compressed archive that contained multiple malicious modules was downloaded and various persistence mechanisms were established.

Outbound command and control traffic was eventually manually observed by on-duty personnel while filing paperwork on the system, weeks after initial infection. This intrusion only added to the troubled history of the Monju nuclear reactor and, following the intrusion, the plant was selected for decommissioning. A third category of ICS incident refers to any unauthorized persistent access on control center systems or field devices from another network such as from the corporate network or the Internet.

This is generally accomplished by subverting access controls for built-in remote ICS network access solutions, for which malicious use is difficult to detect. The intrusion was not discovered until an emergency occurred at a Chevron refinery in Richmond, California and the system could not be used to notify the adjacent community of the release of a noxious substance. Network configuration details have not been made publicly available to understand if and how the emergency alert network was interconnected with ICS devices, but the critical alert system was intended to cover twenty-two states and several regions of Canada.

In March , a Boston teenager connected his personal computer without authorization to a dial-up loop carrier system servicing the Worcester airport and subsequently sent a series of commands that disabled it [54].

Three-minute thesis competition video transcripts

Additionally, the unauthorized access resulted in the disabling of phone service to over homes in the area and affected the local weather service and fire department. Information from the criminal case indicates that the loop carrier system operated by the telephone company was accessible via modem to allow technicians to quickly change and repair service to customers from remote computers. The pair disrupted traffic signals on the signal control boxes at four critical intersections, resulting in significant backups and delays. This particular intrusion was launched prior to a labor protest by city employees.

The systems were disabled for almost two months before the intrusion was identified. This ICS safety-component failure caused thousands of dollars of damages for Pacific Energy Resources but did not trigger any leaks or physical consequences. Pr0f found that the water utility was running the Windows-based Siemens Simatic HMI software, a web-based dashboard for remote access to their SCADA systems, and was connected directly to the Internet with only a simple three-character password for protection.

The database contained sensitive details on vulnerabilities of the 8, major dams across the United States. The SCADA server was directly connected to the Internet via a cellular data connection with no firewall or authentication controls in place. The studied intrusions often started within the business network, or the business network was compromised as a reconnaissance point for follow-up intrusions into the ICS networks.

In all studied incidents involving intentional malicious impacts to ICS systems, the adversary maintained a presence on the target system and had a communication method to communicate orders. In rare cases, malicious access was achieved directly to the field devices over radio or other connection method to the endpoints, however the majority of incidents involved the compromise of Windows-based supervisory workstations that monitor and control field devices.

Targeted attacks and non-targeted commodity malware events also incorporated early rogue connection to the ICS network and thus unauthorized access is a prerequisite for all incident categories. Focused efforts to identify attempted or achieved unauthorized ICS network access may provide valuable early indication of many types of malicious activity.

Because availability requirements often prohibit the restarting of supervisory ICS machines, less sophisticated methods were required to maintain access to the target networks. Unauthorized access generally persisted for a significant amount of time in these case studies allowing attack planning and reconnaissance. The relatively simple persistent access methods and lengthy undetected malicious access to ICS networks were not reliably identified, according to several post-incident analyses, in part due to the constraints of auditing these systems and concerns of introducing instability into the environment.

The incidents used malicious methods to target ICS field devices as well as more traditional intrusion hacking techniques like establishing external command and control, scheduling tasks and modifying the registry to survive reboots, tampering with processes and services, implanting files for future action, laterally moving within the network, exploiting portable media devices, and abusing trusted channels to obtain and maintain an attack position on critical-infrastructure systems. The spectrum of methods was similar to that of attacks on computer systems and networks in general therefore existing effective identification techniques should be carefully adapted for use in ICS environments.

The design of a technique to identify malicious ICS activity requires careful planning to obtain unique ICS data sources and collect and analyze that data under the strict operational constraints of critical networks. The proposed malicious-activity identification methodology consists of collection, analysis, and decision components for host-based and network-based ICS artifacts Figure 3. The framework is modeled after modern intrusion-detection techniques employed in traditional networks [13]; however, these solutions are rarely deployed correctly for ICS networks at critical infrastructure sites [15] and most lack ICS protocol support as well as the signatures and behavioral-anomaly data necessary to identify ICS attacker tactics [7].

Although the goal is the creation of a toolkit that executes this methodology with reliable high-confidence output, the approach may need to be conducted manually at first and continually refined with automation in mind. ICS networks require that specific technical constraints be understood and adhered to for both host-based and network-based tools. Host-based tools are limited to those installed and already available on legacy operating systems.

There exists no centralized view of system security status to draw data from, and field devices do not always store logs. Any security-monitoring commands executed on these systems should be run at the lowest priority level to not interfere with critical processes. The unique configuration and availability requirements of these systems that were explored in previous chapters necessitate the adaptation of all host-based techniques for compatibility and the thorough testing of all commands to ensure critical processes will not be interrupted.

Network-based techniques are constrained by the instability introduced by active tools in the environment. Port scanning and automated device interrogation techniques can crash ICS hardware by scanning too fast or by sending null or malformed packets. Industry best practices recommend, and a few examples in the next section highlight, the importance of remaining entirely passive for network traffic analysis.

That analysis must also include the highly specialized and often proprietary ICS communication protocols. Furthermore, ICS domain interconnection methods and some regulator restrictions limit the ability to tap critical networks at multiple locations. For these experiments, passive network traffic captures were examined that were collected using a high-capacity switch port analyzer SPAN port on a router or switch, a method that should be usable on any ICS network.

Several traditional penetration testing techniques can be translated for use on ICS networks as shown in Figure 4. For example, instead of running a network port scan that sends packets intended to elicit device information that could trigger unexpected results, a penetration tester should verify open ports on each host in the environment without generating network traffic.

This thesis explores eliciting considerably more detailed incident response and forensic data from these systems while following similar constraints. Accounts exist of security researchers ignoring these constraints. Lastly, in August , operators at Browns Ferry Nuclear plant had misconfigured products from two different vendors, which resulted in excessive traffic on the control network. These examples are proof of the complex requirements of any security assessment or forensic action on conducted on production ICS networks. An ideal toolkit requires the hand-selection of the most compatible host- and network-based tools capable of collecting and analyzing malicious activity while still providing ample ICS network coverage.

To aid stability of hosts, an implementation should focus on agentless built-in commands that generate minimal network traffic. For host-based querying for Windows artifacts, only built-in command-line utilities should be used, with an emphasis on the Windows Management Instrumentation Command-line WMIC tool. While no documentation exists on the use of these tools for forensics or live response on control systems, tailoring ICS-compliant queries can identify unauthorized access based on real-world malicious ICS activity.

These tools can run at user privileges but offer the most functionality when run as an administrator. For the research and testing purposes, a new shared administrator account with strong authentication was created on every test workstation to centrally query the hosts.

Writing the Thesis

Documented best practices recommend banner grabbing as a safe activity [58] and this WMIC node querying method uses similarly minimal network bandwidth. For critical real-world networks, the toolkit can run the commands locally with no network traffic with the manual export of results to be collated on a separate closed network. The experiments used the Bro platform for network-based analysis [75]. It is an open-source network solution composed of signature detection, anomaly detection, and a programming language designed to work with network traffic. The signature detection generates logs for which a protocol analyzer abstracts details in real-time.

Alerts can be generated based on pre-configured signature and anomaly notification rules. The programming language defines the actions the platform takes based on logic and structured programming. Bro can parse and analyze network traffic and analysis can be automated through the creation of customized scripts to identify malicious ICS network activity.

The Modbus and DNP3 analyzers process significantly more data than that provided in Figure 6, so custom scripts can be written to manipulate register addresses, values, and additional payload data. The created toolkit should scale in function as future Bro ICS protocol analyzers are developed. ICS Protocol Data. Data Type. Modbus Field. DNP3 Field. Message timestamp. Connection unique ID. Message function code.

Material Identification

Message failure exception code. Response internal identification. Modbus and DNP3 Bro log fields. Additionally, several open-source programming libraries such as jQuery and jVectorMap can be used to aid in reporting and visual effects for the network-based analysis toolkit. Host-based and network-based tools can provide ample coverage of most ICS networks, while respecting constraints of device operation and real-time availability. Host-based tools should be selected to provide very high data-driven, non-probability judgment and convenience sampling rates.

Host-based tools should be used locally on compatible hosts with minimal network traffic generated. Special logic should be included in the host-based scripts to ensure the stability and compatibility with various legacy versions of the Microsoft Windows OS. The network-based analysis should be completely passive by design to not interfere with reliable process control. Specifically, the network toolkit should cover traffic for all hardware including field devices that communicate over Modbus TCP and DNP3, widely considered the two most-implemented ICS protocols in industry [18].

Support for more ICS network protocols can be extended as time permits. This section details proposed technical identification techniques for adversary tactics as observed in the malicious ICS activity case studies. When applicable, host and network script excerpts are provided to illustrate usage.

In several cases examined, the only indication of malicious access was the anomalous operation of the ICS field devices. Stuxnet resided on the WinCC HMI and, using a known hardcoded password, modified rotating motor spinning frequencies in Siemens S programmable logic controllers and valve settings in Ss. This suggests that cleartext protocol data should be examined and ICS protocol datagrams should be inspected for known default passwords and other vulnerabilities.

Obtaining and validating upper and lower field device register-boundary values against site-specific expectations and equipment-tolerance values may help identify overclocked or maliciously manipulated devices. ICS protocol operations should also be used to automatically create a catalog of devices, such as Modbus function code 0x11 and 0x2B that query for device details; responses to these packets can be passively inspected for field device characteristics.

To extract high and low values, the maximum and minimum values for each device register should be stored and checked upon single register request, single register response, and multiple register request events. The script should capture register-boundary values from replayed historical samples then create expected register limits with which to monitor real-time traffic, which requires fewer calculations and events.

Because the ICS network protocol data is passively ingested on network-infrastructure SPAN ports, and Bro is designed to distribute and manage loads [75], no network latency will be introduced when running any network-based scripts on historical or real-time traffic. If too much network traffic is aggregated, packets will merely be dropped on the monitored SPAN port; both ICS process control and network infrastructure operations will continue without degraded or disrupted performance. The analysis of ICS field device register-value ranges requires additional information such as the make and model of equipment and what process it may be controlling.

Organizationally unique identifier OUI bits should be extracted from media access control MAC addresses to assist in fingerprinting devices for both overall awareness and for verification of specific ICS hardware limits. A repository of ICS-specific OUI vendor information and possible field device function has been researched and created for this thesis for offline component identification. Unfortunately, Bro abstracts network traffic at an early phase of analysis and does not currently allow the extraction of MAC addresses from ICS protocol communication.

Additionally, the host-based scripts should query each system to extract MAC addresses from their local ARP tables for devices with which they have communicated, and from any in-range wireless infrastructure Figure 7. This method should assist in monitoring for ARP poisoning as well. Dual-homed devices, those systems that have multiple network interface cards, are attractive pivoting targets for malicious attackers, so any MAC address with multiple IP addresses should also be identified. Host-based passive ICS device enumeration batch script.

Since ICS traffic often should be deterministic the operation of a device should be predictable since critical processes rely on predictable outputs for given inputs the network traffic can be compared against ICS traffic conventions to isolate tampering or anomalous behavior. This provides a stronger indication of malicious activity than on traditional computer systems. One such convention is the hierarchical communication where devices communicate one-to-many or master-to-slave.

Master and slave roles can be easily extracted for IP-based protocols because the source and destination addresses for a specific message type imply its role. Another convention is that ICS field device communication is consistent and so, where on traditional IT networks the packet timing is influenced by human interaction, the ICS protocol traffic generally occurs on set polling intervals.

Additional logic should be added into the Bro scripts for ICS field devices to check for suspicious functions and operations that require an explicit interactive command or reprogramming. Engineers and operators, when presented with this data, would know instantly if it was a command that they issued or if it was a possible unauthorized injected command. The Bro script should include specific verbiage detailing the potential security concern of the ICS field device command transmitted.

For Modbus, function code 0x7D is issued to initiate firmware replacement. Those commands would be instantly verifiable if detected in live ICS traffic. Less critical alerts should also be included in the scripts but require operator analysis to determine if they are malicious or if the ICS network is misconfigured. Since we have established that Modbus is hardware-agnostic and not aware of device register limits, this exception means that the structure of a query was unexpected. Modbus exception code 0x0B indicates that a target device failed to respond or may not be present on the network.

DNP3 uses code 0x21 for an authentication error and 0x82 for an unsolicited response, both of which should be extracted by the created scripts, but it will require an operator to discern whether the presence of the codes indicates malicious activity or device misconfigurations. Comparing observed ICS communication against protocol conventions like these should allow the identification of malicious persistence without the need for an anomaly-based learning period.

The majority of cases studied included some form of malicious communication to external networks. The high signal-to-noise ratio within IP-based ICS networks should make the identification of malicious external communication significantly easier when compared to traditional enterprise networks. All versions of the Mariposa malware use custom UDP datagrams for communication, and infected systems may beacon frequently and send encrypted instructions and data across a variety of UDP ports.

Hardcoded domain names are used in most variants as well. The connections were established with malicious server testqeasd. DNS cache, cookies, and the hosts file can be examined for previous outbound attempts and may help identify planned routes or dormant malware. Since this thesis has argued that ICS networks should not connect to the Internet directly, all attempted or successful connections to external IP space may be of concern, unlike business networks. For instance, both a half-open TCP handshake and a connection that has been rejected by an external IP issuing a reset packet to deny it would not be considered a connection in most traditional networks.

Essay on chess game in kannada

Any external connection attempts using ICS protocols are of major concern and they should be prioritized for the user of the toolkit to review. Every TCP and UDP unique 4-tuple socket pair should reliably generate an event at which time the network-based toolkit should apply logic to filter out the original, meaningful external connections and their connection statuses. The bottom of Figure 8 also displays a portion of the robust ICS port and protocol pairing that should be matched to newly-identified external connections. This ICS protocol tagging should provide helpful context for determining the security risk of a particular connection since a previously-unknown connection may be of more immediate concern if it is communicating over an ICS protocol.

External ICS network communication identification Bro script excerpt. Passive IP address geolocation can be conducted within this same script using the offline MaxMind GeoIPLite database to plot attempted and established connections from the ICS network and the protocol used. The geolocation and plotting of external connections on a map should help to determine approximately where anomalous traffic is bound. This offline JSON data feed should be continuously-updated while also being projected onto a world map using the jVectorMap library.

Bro script variable translation to jVectorMap overlay data. Similarly, inbound Telnet sessions from the corporate network, simple network management protocol SNMP versions 1 and 2, and any simple mail transfer protocol SMTP or other e-mail traffic should be flagged [65]. If internal web servers are being used for SCADA services, examining irregular HTTP properties like user agent strings can help to identify automated attack tools or manually-performed attacks.

The noise may be due to.

These tasks commonly. Prior to the character recognition, it is necessary to eliminate the imperfections. The imperfection in the optical scanning. There are many techniques to reduce the noise. Basically, the filtering function is used to. For example, the symmetric. Gaussian filter function is used for smoothing equally in all directions. An alternative approach is. It is. Two basic morphological operations. The normalization method is popularly used in character recognition to reduce all types of. However, it also gives rise to excessive shape.

The usual methods for normalizing a. Skew normalization Due to variation in the writing style, the skew can hurt the effectiveness. Various methods have been used, which are the projection profile. After skew detection, the character or word is translated to the origin and rotated until. Slant normalization:. The character inclination typically found in cursive script is called slant. Formally, it is defined as the angle between longest stroke in a word and the vertical. Slant normalization is used to normalize all characters to.

Many methods have been proposed to detect. One of the used methods is based on the center of. Size normalization:. It is used to adjust the size, position and shape dimension of the. This step is required for reducing the shape variation between images of. The smoothing operation is done to regularize the edges in the image, to remove small bits of. Furthermore, different. In freeman's direction extraction, smoothing is done by comparing each code with. Skeletonization is a morphological operation used for reducing foreground regions contour in. Methods for the skeletonization are divided into two main approaches: iterative and non-.

When using the iterative approach, the peeling contour process parallel or. In contrast, the non-. Unfortunately, these techniques are difficult to implement and slow as well. Thinning can be somewhat performed for skeletonization using methods like erosion or. In this mode, it is commonly used by reducing all lines to single pixel thickness as. When designing a digit recognition system, the most important step is the segmentation of the. This step is a non-trivial problem, due to several. The first one is the inherent nature of the script that is cursive and at the same.

The second one is the high degree of variation in writing styles produced by. The segmentation systems can be divided into two approaches: implicit and explicit. Indeed, this approach does not attempt to separate digits, but rather it incorporates. In this case, the segmentation and recognition are performed separately. Hence, many algorithms were proposed to separate the couple or string of contiguous digits. The recognition-based approaches.

The segmentation-recognition approaches. It is used to construct the segmentation paths where each segmented. Generally, the recognition reaches very high performance when dealing with spaced digits. However, when a complete segmentation system is used, the recognition is more difficult since. When the over-segmentation problem is occurred, the segmentation cut is performed in intra-.

When the under-segmentation problem is occurred, the lack segmentation produces wrong. For example, the connected digits often are. Therefore, this problem can provide the confusion between isolated. Some examples of the under-segmentation problem are shown in. The proposed segmentation system for separating the couple or string of contiguous digits will. The feature generation can be defined as a problem for extracting the most pertinent. This pertinent information often is. The feature generation methods are many and varied.

Each one has its own properties and can. Before bringing our choice on a method. There are numerous. The global features are generated from the entire character image using for instance the center. The global features is faster speed since the required values for calculating and the matching. It is generally used for the simple character detection. The statistical features are derived from the statistical distribution of pixels in the character.

They offer high speed and low complexity and take care of writing. They may also be used for reducing the dimension of the feature set. One of the most common statistical features is the moments extracted from images Hu,. Zernike moments Therefore, the Zoning used for dividing the character into several frame. The geometrical and topological features may represent global and local properties of the.

The topological features can encode some knowledge about the. The specific features include shape descriptor and geometric structure features. The shape. The geometrical and topological features generated from the character include the complex. All these measures can be integrated into a single feature vector for recognition of a.

The classification is a technique that allows assigning an unknown pattern into a predefined. The classification workflow uses either unsupervised or supervised methods to. It can perform an unsupervised classification. In this case, the classes are generated according to the. In contrast, the supervised classification requires training data. For handwritten digit recognition, the nearest neighbour classifier is one of the simplest and. Vector Machines SVM originally developed as a two class classifiers have been extended to.

In the case of the handwritten digit recognition, supervised classification has been adopted. It can be performed in two stages:. The goal of the training stage is to train the classifier with the known digit. Recognition and decision stage:. This stage classifies the input pattern by comparing them to. This stage also uses classification techniques in the form of. The decision stage is strongly influenced by the feature generation step, and a. Statistical techniques are based on a statistical decision theory that uses the statistical decision. It is performed on the shapes to be recognized.

However, it requires a significant number of samples in order to achieve correct training of the. The study allows reflecting their distribution in a. The main statistical techniques which are applied in the character recognition field are the. Parametric Recognition:. This method requires a priori information about the characters for. Therefore, once the parameters of the. Non-parametric Recognition:.

This method does not require a priori information about the. It is used to separate different pattern classes along hyper planes defined in a given. The finest known method of non-parametric categorization is the Nearest. Neighbor NN and is extensively used in character recognition. An incoming pattern is. Syntactic and structural techniques. Syntactic and Structural techniques are based on the structural primitives taking into account. In general, it is assumed that the character primitives. Furthermore, structural methods are strong in finding a correspondence mapping.

The main difference between these structural techniques and statistical techniques is that these. Several techniques are available such as. The stochastic techniques use a model for recognition, taking into account the high variability of. In these techniques, the models are often discrete and many studies are based on. Markov fields permit the modeling of the. The model describes these states using state. The most common methods in these. Overview of Support vector machines SVM. The support vector machine SVM is a popular classification technique including linear and.

SVMs are originally. In addition, they come to meet two major disadvantageous in machine learning which are:. Avoid over-learning overfitting. Two other advantages are particularly offered by SVMs. First, with an appropriate kernel, the. SVM can work well even if data are not linearly separable in feature space.

Second, especially. These two advantages allow us to select SVMs as better candidates for performing. SVMs are a machine learning algorithm for performing classification and regression via a. SVM has its ability to select the representative dataset from. Once learned, a decision function is constructed in order to classify data according. The error is minimized by maximizing the margin controlled by the VC dimension of the. For a better clarity, the theoretical aspects of SVMs are explained in the following. In simple terms, given a set of training samples, each labeled as. New samples are then predicted to belong to a class.

It is based on the use of a function called decision. The principle of SVM during optimization is to maximize the margin between classes in order to. The SVM classifier will classify all the points on one side of. Where is the normal of. In the case when the training data are linearly separable, the classifier can select two. For all data support vectors. It is possible to rescale. These can be combined into one set of inequalities:. Optimal hyperplane. The points for which the equality in equation 1. Note that h1 and h2. For a hyperplane equation.

SVM constructs a decision surface hyperplane that maximizes the separation margin by. More precisely, the. VC dimension is a measure of the capacity and complexity of a statistical classification theory,. Now, choose some. Then, for losses taking these values, with. More precisely, the support vector machine is an approximate. This induction principle is based. In the case of separable. In the case of non-separable. For the SVM optimization problem, the maximum possible margin between the. According to the optimization theory, the. That must be canceling the partial derivatives with respect to.

In this expression. The Lagrange multipliers are deduced by setting the partial derivatives to zero, which. All other points have null. Finally, the equation of the. The Soft Margin method allows selecting a hyperplane that splits the examples as cleanly as. In the case of.

The good ranking constraint. The Soft Margin method introduces non-negative slack variables , which measure the degree. If the penalty function is linear, the optimization problem. In this case, the maximum possible margin between the hyperplane and the support vectors,. If the penalty function is linear, the minimization. This parameter allows striking a balance between the two competing criteria of.

The problem of a nonlinear classifier is addressed by nonlinear transformation of the inputs into. The method of linear separation, previously presented is somewhat limited. The introduction of. The main idea of the nonlinear extension. For an enlarged feature space, consider transformations of. The data not linearly separable in 2-D are mapped onto three dimensions where a.

In this high dimension space, the. In the SVM formulation, the training data only appear in the form of dot products. These can be replace by dot products in the Euclidean space. In the equation of the separating hyperplane 1. The equation of the separating. The kernel functions commonly used are:. Sometimes parametrized using. Polynomial kernel:.

Sigmoid kernel or Hyperbolic tangent:. The effectiveness of SVM depends on the selection type of the kernel, the kernel's parameters. A common choice is a Gaussian kernel, which has a single. The best combination of and. SVM is designed only for separating two classes. Its extension for multi-classes has conducted. The multiclass problem can be processed as a. When using SVM for handwritten digit recognition, we have a.

Hence, some approaches are used for extending the binary SVM to. Graph SVM. We review in the following the main properties of the three implementations. One-against-all approach constructs. The classification of new samples. In this way, the number of unclassified data is reduced. However, when the maximum. Hence, this method leaves regions of the undecided feature space where more than one.

This is demonstrated in figure 1. Another combination method is "One-Against-One", also known as. The classification is done by a max-wins voting strategy, in which. Then, the vote for the assigned. During the classification step, max-wins voting strategy is. However, when two classes have the same score,. This concept is. To resolve unclassifiable regions for the pairwise. When having. In the figure, shows that. In this case, any pair of classes can be the top-level. Except for the leaf node when.

For example, if. Thus, it can belong to the. Class 1 or 3 and the next classification pair is Classes 1 and 3 supplied by the decision. The objective of this chapter is to overview the main modules composed a character recognition. Depending on the application, each module has its own importance. In our case, we. In the next chapter, we will introduce our system with a brief overview for recognition of.

Not 2 Not1. Not 3 Not1 Not 2 Not1. This chapter investigates the combination of different statistical and structural features for. The objective. These features include some global statistics, moments, profile. Some of these features are extracted from the complete image of digit while others are. The experiments conducted on. Handwriting recognition has been the premier research problem of the document analysis and.

The sub problems in handwriting. Among these different modalities of handwriting recognition, this chapter. Unlike alphabet, the ten glyphs of the most commonly used Western. Arabic numerals are shared by many scripts and languages around the world making them. The main challenges in handwritten digit recognition arise from variations. With the recent advancements in image analysis and pattern classification, sophisticated digit. In this chapter, we are interested in enhancing the feature generation step for isolated digit.

The idea is to find a combination of multiple. Overview of isolated handw ritten digit recognition. Over the years, various handwritten isolated digit recognition systems reporting high. Most of these systems have been evaluated on the. Among significant contributions to digit recognition, authors in leCun et al.

The method of Cai and Liu [34] present an approach that integrates both statistical and. Dong et al. Belongie et al. The proposed matching technique achieved high recognition rates when applied to. In another notable contribution, Lauer et al. Classification carried out using Support Vector Machine. A comprehensive survey on handwritten digit recognition. In this competition, we have participated with two 2. In the following, we can. Salzburg I method:. The approach is based on the Finite Impulse Response Multilayer.

First, the color images are transformed to 8 bit gray-scale images. Each pixel value has been normalized into the range [-1, 1]. For the experiments,. This method uses one partially. Salzburg II method:. The description of this method is similar to the previous Salzburg I , but. The approach is based on the combination of four descriptors which allow. For the pre-processing, a thresholding.

For the feature generation, three different. For each region a histogram of orientations. Then, the descriptor is produced by concatenating the region. Then a thinning operation is applied. Finally, the digit descriptor is the concatenation of the four described descriptors yielding a. The system is finally developed using the selected features. Paris Sud method:. The images are pre-processed as following: First, getting rid of.

Then Hamming tree algorithm and the. MH implementation are used as classifier.

  • essay on heavenly creatures;
  • Sample literary analysis essay apa.
  • essays on marketing concept.
  • essay about place to visit?

The chosen classifier has 47, trees of. In each boosting iteration random Haar filters are tested, chosen. It is then magnified to. Finally a skew and slant normalization is applied. The black. Delaunay Triangulation is built on the input data points. A multilevel static uniform zoning is. For each cell of a grid, two values are appended: the number. Both the centers of gravity of the triangles and the black pixels within the cell are input. The feature. The digits are classified with a nearest-k-neighbor classifier. For the. The recognition module is. SVM and. The probability. The proposed method includes.

The objective of our study is to find a combination of features which achieves high recognition. We have considered global and local,. The features that we consider in our. The proposed approach. To evaluate our technique on the same database but normalized.

We used the size. In this way, the feature generation task is. In our case, the method based on bilinear interpolation was selected which assign each target. For example, suppose that the gray levels. Other types of interpolation can be defined by using more points with integer coordinates. In our work, the. Feature generation aims to express input data using a numerical representation or a set of. Features are generally categorized into global.

Statistical features represent pattern classes by. Commonly used statistical features include moments, descriptors and geometrical. Examples of structural features include bends, end points, intersections,. Structural properties can sometimes also be. Each of these types of features can be extracted globally.

However, each feature is more suited to one type or the other giving. In the following sections, we provide an overview of the features used in our study. Global features are computed from the image of the digit as a whole. The global features we. Computing the density would be to count the number of black pixels and dividing by the total. The centre of gravity, or sometimes called as centre of mass, is a point in an image in which its. They characterize images in a way that has analogies to statistics. Generally, an image may be considered as a Cartesian density distribution function I x, y ,.

Then, the two-dimensional geometric moment of. The basis may have. In order to express the above equation in a discrete form, the image has been sampled into. The zeroth order moment defines the total mass of the image segmented image, its area, total. The tow first orders geometrical moments define the centre of gravity of the image,.

These two coordinates represent the center of gravity of the digit image. Then, the central. The second order geometrical moments are a statistical measure of the allocation of pixels. In mechanics, they are called the. It may be used to determine the principal axes, image ellipse and radii of. Number of transitions:. The number of white to black transitions or vice versa counted at each pixel in the four. Hu proposed the application of moment invariants to image analysis and object representation.

Since then, they have been effectively applied to a large number of shape. Hu's seven moments are invariant with respect to position,. These moments capture information on image area, centroid and its. Hu's moments invariant are calculated using combinations of second and third order normalized. The normalized central moments are given by.

Hu's seven invariant moments are then calculated by. The seventh moment invariant, , is also skew invariant. These seven invariant moments were. The skew or orientation of the digit is calculated using Radon transform of the image [47, 48]. Radon transform of the image is the sum of radon transform of each pixel in the image. Zernike moments [49] have been widely employed in a wide variety of pattern recognition. For efficient computation. Zernike complex moments are constructed using a set of Zernike polynomials.

The radial polynomial R. Zernike moments for a digit image can be obtained by making use of the complex conjugate. We will skip directly to the formulation of the moments in. The procedure for computing Zernike moments can be seen as an inner product. If an image function , having Zernike moments. The discrete form of the Zernike moments of an image is expressed as follows:. The amplitudes of Zernike moments. Invariance to the scale and. In our implementation, we compute up to fourth order Zernike moments.

Horizontal and vertical projections are determined by counting the total number of text pixels in. The mean and variance of these projections are used as features in our study. Left and right profiles are computed by considering for each image row, the distance between. Like projections, the profiles. The background feature vector is based on the concavity information. These features are aimed. Each concavity feature is. The label for. Each direction is. In addition to the nine standard concavity configurations, we also consider five additional.

These configurations are illustrated. The foreground features are computed from two different representations of digit, contour and. Each of these types of features is discussed in the following. These features are aimed at capturing the dominant orientations in the shape of the digit and. The contour is detected using. This generates a string of codes in the interval [1 8].

The normalized histogram of these codes is then computed and is. This histogram of contour chain codes is effective in capturing the dominant stroke directions. However, these features are very sensitive to noise and also. Freeman directions. These features are computed from the skeleton of the image of the digit. The skeleton of the.

The neighbor N p of a current pixel. Figure 4 illustrates some examples of each type of points labeled as 1, 2, 3 and 4 respectively. The normalized histogram of the occurrence of these points in a digit is used as a. These skeleton based features compute the structural information of the digit but like contour.

The Ridgelet transform combines the Radon and wavelet transforms. Radon transform has the ability to detect lines in the image while the wavelet transform allows. Ridgelet transform has been successfully applied to a number of problems including image. The Ridgelet transform is based on the Radon transform which is computed on several angular.

Radon coefficients correspond to projections representing the shadow of the shape. Consequently, significant linear features in any direction are expressed by. Thus, in order to characterize linear singularities, the one-dimensional wavelet. Hence, along the Radon. For an image , , the Ridgelet transform can be computed by first calculating the Radon. Where , and are Dirac distribution, angular and radial variables, respectively.

The 1-D wavelet transform is then applied on each Radon slice in order to obtain the Ridgelet. The sum of the normalized values of the coefficients is used as feature. Uniform grid sampling [54] is applied to the image of the digit which allows extracting features. A uniform grid creates rectangular regions for. For a given image of. Where p is the vector of line positions, n is the number of horizontal or vertical regions, and k is. Figure 5 illustrates an example of a digit split into a 2x2. Once the image is divided into different regions, features are extracted from each region.

This allows a different level of granularity and features extracted from similar. A summary of the features used in our study along with the dimensionality of each is. Projection Histograms. Profile based features. Histogram of contour chain code. Skeleton based features. Ridgelet transform. The proposed recognition engine is based on SVM multi-class approach using the one-against-. The features discussed in the previous section are extracted from. Thus, some properties should be considered between C and. Large values of C counter-balance the bias introduced by large. Very small value of may result in good training data error rate but won't be useful in the.

With large value of the Gaussian kernel of RBF becomes almost linear. We consider an optimal pair of. The next section presents the experimental setup and the corresponding results. In order to validate the concept of feature generation as well as to show the robustness of our. The first experiments. This work [55] is devoted to generating pertinent features in order to find the best ones of. The algorithm that we propose. The isolated handwritten digits have the advantage.

This database is divided into 3.

Assessment and Results Policy (MPF) : Policy : The University of Melbourne

We divide this database into three different parts:. Various feature methods are generated from the normalized and not normalized isolated digit. We test five feature sets independently to SVM. Finally, the recognition module is based on the SVM multi-class approach using the One-.

Against-All implementation. In the following experiments, we provide various combination of the features used in our work. This combination appears. Generally, the final features vector. This evaluation is illustrated in the Table 2. This experiment is performed with the uniform grid which is divided into four. Features are generated from each region as described. Finally, the global feature vector is composed of 35x4 components. This evaluation is illustrated in Table 2.

This experiment is conducted by combining two structural features background. For the three remaining methods: Global. Then, we generate for each. This evaluation is reported in Table 2. Experiment 1. Experiment 2. Experiment 3.

We can note that the overall rate is acceptable for Experiment 3. However, some digits are. This evaluation is significant because the results given in Table 2. However, we can see that the. Experiment 3 shows that the use of background features without uniform grid method. It is used in order to reduce confusion between the classes "4" and "7" for. However, we can note that the methods of structural feature generation used work. The digit image is divided into separate regions via splitting sampling allowing the generation of.

The use of a uniform grid allows this information to be. Features generated. These samples are contributed by different writers with varying. Prior to feature generation, we binarize all images using the KittlerMet. A summary of the features used in these experiments, seven of these. The features are directly extracted from the binary images of digits. Then, the proposed recognition module is based on SVM multi-. The performance of the system is quantified by computing the precision and recall.

We first. It can be seen that while the. Zernike moments f4 and background features f7 also realize acceptable. Almost all the combinations report high recognition rates whereas the highest. Features combinations. We also carry out a series of experiments to compute the precision and recall for each of the. The recognition results for each digit are summarized in Table 2. Relatively low recall is achieved on some digits. Furthermore, it should be noted that some pairs like '0','8' , '2','7' and '3','8'. We also compare the performance of the proposed combination of features with state-of-the-.

The database CVL and evaluation protocol considered in our study is the. This comparison is summarized in Table 2. Normalized Digits. Our method. Comparison of proposed method with state-of-the-art methods [40]. It can be seen from Table 2. The combination of. The objective of this chapter was to find a representation of isolated handwritten digits that. We proposed a combination of different statistical and. This combination of features. The experiments conducted on a standard database of. The initial results obtained by the proposed combination of features are very encouraging.

Connected digits are the frequent observed situations occurred on the digit string. Hence, we. The oriented sliding window is used for. Whilst, SVM-based segmentation-verification using the global decision module. The usual approach for. The preprocessing step generally allows transforming the. After that, a feature. Next, a decision function allows.

  1. organ donation persuasive essay.
  2. teaching critical thinking skills in nursing.
  3. Three-minute thesis competition video transcripts - Faculty of Pharmacy and Pharmaceutical Sciences.
  4. the essays of michel eyquem de montaigne.
  5. research papers on use of ict in education.
  6. Navigation menu.
  7. In order to improve the reliability of the system, an additional step is performed for verifying. The segmentation-recognition with verification is usually namely. Generally, algorithms based on the segmentation-recognition. The design of a robust handwritten digit recognition depends of many factors such as the. In this chapter, we only are interested in the segmentation stage since it is considered as. Hence, three cases. In most cases, connected digits are the frequent observed situations [60]. Furthermore, the connection between adjacent digits can be simple only one connection of.

    The simple connection is the. Various developed. Overview of tw o connected handw ritten digit. Various explicit segmentation algorithms of two handwritten connected digit have been. More recently, these algorithms have been compared objectively using synthetic and real. In the following, we briefly review the most popular algorithms to segment.

    Many algorithms based on the contour and profile have been proposed for segmenting two. The first one has been proposed by Fenrich and. Krishnamoorthy [67], which use the vertical histogram projection and contour. The cutting path is. However, the. Fujisawa et al. However, when the. Congedo et al However, the main difficulty of using this algorithm is finding the best start point.

    Therefore, completely ineffective segmenting paths will be. Shi and Govindaraju [63] proposed to segment two connected digits followed.