Onmiddelijke toegang tot ongestructureerde informatie

Beyond Relational: Designing and Deploying Applications for Mission Success Today

Download Onmiddelijke toegang tot ongestructureerde informatie

Het bouwen van informatiesystemen op basis van relationele database-managementsystemen (RDBMS) kent vandaag de dag grote risico’s. RDBMS stammen uit een tijd dat informatie nog gestructureerd was en databases op vaste manieren werden ondervraagd. Nu data exponentieel groeien, zich in allerlei formaten aandienen en razendsnel veranderen, zoeken organisaties alternatieve informatiebeheersystemen die wél de juiste flexibiliteit bieden. Deze whitepaper beschrijft de vele voordelen van MarkLogic Server, dat onmiddelijke en gebruikersvriendelijke toegang mogelijk maakt tot alle mogelijke verzamelingen van ongestructureerde informatie, vanaf elk gewenst apparaat of systeem.


november 2011

Het einde van de relationele database

Onmiddelijke toegang tot ongestructureerde informatie

Hoe MarkLogic succes verzekert

Beyond Relational: Designing and Deploying Applications for Mission Success Today Table of Contents 1 Introduction 2 Today's Mission Success Requirements 2 3 3 4 4 5 Exploit Heterogeneous Information Provide the Right Access Deliver On Time and Within Budget Easy Information Aggregation Search, Analytics, and Delivery at Scale Rapid Application Development 4 How MarkLogic Ensures Mission Success 5 Conclusion 6 About MarkLogic Introduction This paper discusses the advantages of using MarkLogic Server for designing and deploying applications for mission success. It is intended for agency executives who are tasked with delivering mission-critical information applications. Technology choices have been largely one-sided. Relational database management systems (RDBMS) from vendors like Oracle become the default selection in many projects because of existing enterprise-wide site licenses and widely available expertise. But accepting the default selection is no longer satisfactory considering the sea change of information. RDBMSs were invented when information was highly structured and consistent, low in volume, entered through forms, and retrieved with a well-known set of queries. Information is now heterogeneous, unannounced, changing, and exponentially growing. This change requires a new way to get work done that goes beyond leveraging incumbent tools. Henry Ford said it best, "if I had asked people what they wanted, they would have said faster horses." New mission requirements today point agencies in a different direction. Using the wrong tool results in slower time-to-deployment and getting far less for overall expenditures. It puts project success at risk, and may also put the entire mission at risk. Agencies need to explore alternatives to RDBMSs to avoid project failure, and MarkLogic Server, a purpose-built database for unstructured information, is one such alternative. By "unstructured information," we simply mean information types that do not fit well in the relational model. In certain cases, this information might be highly structured or semi-structured, but structured in a way that does not fit easily in rows and columns and requires significant pre-processing or "normalization" to load into an RDBMS. Examples include: raw data, metadata, geospatial data, doctrine, email, MULTI-INT, and message traffic. MarkLogic is optimal for information that does not fit well in the relational mode. Since each organization has its own terminology for this type of non-relational information--structured, semi-structured, or unstructured--this paper will refer to it as "information." 1 MARKLOGICWHITEPAPER Today's Mission Success Requirements Today, mission success continues to depend upon technology solutions that support effective information management. The requirements, however, have evolved over the years, so the solutions must evolve as well. The main requirements are described below, grouped in three categories. All requirements must be collectively met to increase the chances for mission success. "This notion of thinking about data in a structured, relational database is dead." Vivek Kundra, Federal CIO July 21, 2009 Open Government and Innovations Conference information sources, but also new and unannounced sources of information in distinct formats and structures. These unannounced sources range from MULTI-INT collectors, to non-standard metadata catalogs, to foreign websites. Once imported, information is available for search and analysis through a unified user interface. With aggregated heterogeneous information, users have more information at their disposal than if they were to access limited repositories. The ability to see all related information with a single query gives users a more accurate view of all relevant, available information. In addition, the aggregated collection enables users to spend more time taking action on the information they find, and less time navigating across data sources. The use of taskspecific "data silos" of homogeneous information is no longer satisfactory as a sole option, since organizations are recognizing the need to share information within and across agencies. The major challenge with exploiting heterogeneous information is modeling multiple formats into a collectively useful way. Unless the system was designed for heterogeneous information, administrators and developers will encounter hurdles. For example, RDBMS-based systems require a tradeoff between ingestion speed, index creation, retrieval speed, and maintenance. An administrator might be able to load numerous information types quickly. However, to get reasonable retrieval performance, optimal indexing is required, which significantly extends the time to load information. With tools that add XML support to RDBMSs such as Oracle's XML DB, some of the modeling effort is simplified, but the indexing strategy still requires lengthy planning to achieve a usable, scalable system. · Exploit heterogeneous information · Integrating structured, semistructured, and unstructured information · Sharing information from multiple sources and in varying formats · Rapid incorporation of unannounced new information sources · Provide the right access · Large, scalable information databases accessible in real time · Multiple forms of user access -- search, faceted navigation, geospatial · Delivery to many types of devices and systems · Deliver on time and within budget · Quick stand up of new and enhanced applications The following sections elaborate on these requirements, and subsequent sections in this paper will cover reasons how MarkLogic addresses them. Exploit Heterogeneous Information Exploitation of heterogeneous information involves aggregating diverse information types from many sources into a central repository. This entails loading not only a known set of 2 MARKLOGICWHITEPAPER And once all information is loaded, there are still challenges around converting the information stored as rows and columns into a usable output format. A large amount of unique code is required to access each information type, so application development becomes excessively costly. And whenever new information types are added, the challenges are compounded since the application has to be recoded to handle the new formats. to information at any location. Providing the right access to information is difficult with systems that were not designed to meet the information challenges of today. These systems lead to excessive spending on hardware, software, and integration when efforts should be focused on building missionspecific features. RDBMS-based systems require high-end servers for optimal performance, and need many of them to scale with governmentsized information. As discussed earlier, ingesting information in a useful way is slow, leading to unacceptable latency. Features like real-time alerting and mobile delivery are often relegated to a lower priority only because they are too expensive to implement on traditional technologies. "When storing unstructured/semi-structured data, look at mature native XML databases that offer better support and performance, especially for large environments." Noel Yuhanna Forrester Research Provide the Right Access With constant global change from flash crashes to legislation like the Freedom of Information Act, expectations around information immediacy are increasing. And considering the U.S. Government creates more information than any organization in the world, scalability is a critical feature for today's systems. Also, users need powerful search, retrieval, analysis, and delivery capabilities to quickly find the information they need. Without the right access for users, organizations end up making huge compromises. Whether a mission involves situational awareness, battlefield data, or status of personsof-interest, agencies should not risk failure because of technology shortcomings. Delays in information availability are costly because the value of certain information rapidly decreases over time. Exponential information growth should not require an exorbitant expenditure on infrastructure that does not scale well. Users should not struggle to find what they need in the volumes of changing information due to missing capabilities such as faceted navigation, geospatial search, and realtime alerting. And finally, users should not be confined to their desks to get information, as the growth of mobile delivery should enable immediate access Deliver on Time and Within Budget Constraints with schedules and costs are always inherent in application development projects. Time is always limited because users continually demand more tools for their daily tasks. And once new tools are promised, users want them as soon as possible. Staying within budget is an obvious concern, as keeping costs within expectations allows project completion. Consistently finishing projects on time and within budget reinforces business and leadership skills, lowers perceived risks, and facilitates the justification of future application development projects. To attain on-time delivery within budget, organizations must find ways to more quickly stand up new and enhanced applications. With faster application development, organizations achieve faster time-to-mission and put required tools into users' hands sooner. Also, the reduced development timeline helps reduce project costs. One other critical benefit is faster development allows 3 MARKLOGICWHITEPAPER teams to put more effort into highvalue features and less time on lowlevel tasks. Teams can build complete, effective solutions that let users be successful. A common challenge organizations face is trying to build critical features for heterogeneous information into an application based on an RDBMS. Developers find all of the factors mentioned earlier in this paper, including multiple data formats, information volume, and advanced retrieval features, are hard to address with RDBMSs. More time than necessary is spent during design, development, deployment, and maintenance, resulting in longer timelines, higher costs, less ef-fective applications, or all three. heterogeneous information because it is schema-agnostic, meaning it loads information "as is" without requiring extensive up-front data modeling. Information does not have to be normalized into rows and columns, and extensive index planning is not required to create a high-performance system. This facilitates aggregation of multiple data sources and multiple formats, and often saves several months of design time and cost. The time savings have enabled project teams to focus more on value-adding features in their applications. Contrast this to the relational model, in which information modeling is time consuming and requires significant design effort. Extensive tweaking is required as the business case or data evolves. Mapping information into rows and columns puts an extra burden on performance tuning, as it is often hard to predict which indexes to build to accommodate user behaviour and expectations. And even when the data model is right, new and changing data formats require modifying the model, adding more overhead to the development effort, and may even require starting from the beginning. This can involve several months of effort, or longer depending on the complexity of the information. Many customers switched from Oracle to MarkLogic because they were spending too much time trying to consolidate disparate information formats into a single data model. With MarkLogic, they were able to avoid the significant ongoing modeling effort. How MarkLogic Ensures Mission Success Achieving mission success is difficult without the right tools. Many organizations leverage RDBMSs as general purpose tools for managing information and in doing so, spend more time on application development than necessary. RDBMSs like Oracle were built for the structured, unchanging data most common decades ago, but are not suited for most other critical information. But because of the ubiquity of RDBMSs, organizations continue using them for information without exploring better alternatives. MarkLogic's experience in government installations shows it to be a better technology alternative because of its core strengths in easy information aggregation, search/analytics/delivery at scale, and capabilities with rapid application development. Search, Analytics, and Delivery at Scale MarkLogic natively provides many powerful retrieval capabilities such as faceted navigation, geospatial search, real-time alerting, advanced full-text Easy Information Aggregation MarkLogic is the right tool for exploiting 4 MARKLOGICWHITEPAPER search, and co-occurrence analysis that were built from the ground up with high performance and scalability in mind. MarkLogic is built on a shared-nothing architecture that enables linear scaling on commodity hardware. And since MarkLogic uses XML as its data model, information can be easily updated and retrieved at any granular level, and dynamically transformed into a variety of output formats. Multi-device delivery is possible without significant coding and re-coding because there is no data translation from rows and columns into common delivery formats. MarkLogic's breadth of functionality for information gives developers the tools they need to build powerful, mission-critical applications. Although many powerful applications have been built with an RDBMS at the core, those applications typically require years of development effort to provide the access users require. Since RDBMSs were not designed for information, extensive integration work with third-party technologies is required. That approach introduces a number of problems, including increased hardware requirements, more code to maintain, architectural complexity, and latency. While many application developers continue to struggle with these issues on an RDBMS, the efficient and effective approach is to leverage a DBMS purpose-built for information. with certain external technologies such as search engines is unnecessary. And since it stores information as XML, dynamically transforming information into a variety of outputs is greatly simplified and enables faster application development. Many other features also contribute to faster application development. MarkLogic Application Services is the set of products that further expedite the information loading and application development processes. Connectors and toolkits for MarkLogic enable quick integrations with popular server and desktop software packages. And the entire services ecosystem MarkLogic Corporation provides--including support, training, professional services, technology partnerships, and partner certifications--ensures MarkLogic implementations are successful and aligned with organizational goals. While RDBMSs like Oracle have gained a large ecosystem of tools for expediting application development, these tools only help for information that works well in the relational model. These tools do not sufficiently address the unnatural efforts required to store and retrieve information in an RDBMS. Extensive data modeling, extensive integration work, data translation, and excessive administrative over-head all threaten on-time and in-budget delivery. "Alas, when used to build content-centric applications, the ROX (relational/objectoriented/XML) dev stack yields demonstrably and startlingly poor results in terms of developer productivity, project predictability, and application performance." Lyn Robison Research Director, The Burton Group "The Methodology for Overcoming Data Silos (MODS): Using the New XQuery Development Stack" Rapid Application Development Speed and agility are the major characteristics of the application development process on MarkLogic. The native services-oriented architecture in MarkLogic Server allows faster and easier development of web-based applications. As discussed earlier, MarkLogic can load information "as is" to drastically reduce design time. With its unified architecture, integration Conclusion RDBMS technology is well established and is often the default choice for many organizations. But organizations are recognizing issues such as complex application architectures, slow performance, insufficient functionality, higher overall costs, and longer timelines are inhibiting mission success. MarkLogic technologies are all about 5 MARKLOGICWHITEPAPER giving customers speed, agility, IT efficiency, and effectiveness. MarkLogic customers have built research portals, iPhone/iPad applications, intelligence repositories, metadata catalogs, social applications, and other information-rich applications with significant quantifiable gains around development time, performance, and scalability. Customers report significant benefits when using MarkLogic including: · 50% reduction in development time · 70% reduction in amount of software code · 90% reduction in overall development costs · 120 times faster information ingestion (for example, 12 minutes to six seconds) · 100 times increased concurrent user capacity · 100 times faster ingestion of Department of Defense Discovery Metadata Specification (DDMS) information · 95% reduction in database administration costs (for example, 0.5 FTE for a 100 server cluster) The gains are due to MarkLogic's architecture. First, MarkLogic stores information in XML, which can easily transform into web formats like HTML, XHTML, and RSS, eliminating complex code to trans-late between data types. Second, MarkLogic's unified architecture consisting of a DBMS, search engine, and application server, removes code to integrate critical functionality. Third, a built-in framework for integration with thirdparty entity enrichment tools enables an easy way to enrich in-formation. And finally, complementary products like Application Services enable easier information loading, and rapid prototyping and development of baseline search applications that can be further customized with unique features. The combination of all these native capabilities increases the chances for mission success. About MarkLogic MarkLogic Corporation is revolutionizing the way organizations leverage information. MarkLogic customers in industries including media, government, and financial services use MarkLogic to develop and deploy information applications at a fraction of the time and cost as compared to relational databases and search engines. The company's flagship product, MarkLogic Server, is a purpose-built database for unstructured information. When we talk about unstructured information, this is our shorthand for information that does not easily fit in the relational model. If you take XML as an example, it is actually quite structured but just in a way that often requires huge amounts of code, hundreds of columns, and dozens of tables when storing it in an RDBMS. Anyone who has gone through an exercise of trying to store XML in a relational database and then run queries against it at speed and scale, usually wishes there was a better way. MarkLogic makes this much easier to accomplish with significant improvements in both performance and scalability compared to RDBMSs. 6 MARKLOGICWHITEPAPER MarkLogic Corporation One Kingdom Street Paddington Central London W2 6DB +44 (0) 203 402 3619 Version 2 April 2011 © Copyright 2011 MarkLogic Corporation. MarkLogic is a registered trademark and MarkLogic Server is a trademark of MarkLogic Corporation, all rights reserved. All other product names mentioned herein are the property of their respective owners.
Uitgelichte Whitepaper

Mobiel testen: ontwikkel uw strategie!

Mobiel testen: ontwikkel uw strategie!

Mobiel platformen veranderen de manier waarop we leven en werken. Waar vroeger PC’s en notebooks alleen heersten, zijn tablets en smartphones hard op weg om een steeds crucialere rol in te vullen. Een studie van ABI Research toont aan dat het gebruik van mobiele applicaties op...

Whitepapers nieuwsbrief

Wil je op de hoogte blijven van welke whitepapers er zijn toegevoegd aan de Computable IT Knowledgebase? Abonneer je dan op de gratis nieuwsbrief.