G30 Consultants
Hero Image Clee Hill

G30 Consultants provides objective consultancy on technology, architecture and the process of designing, developing and deploying software systems. The basis of this authorative and broad coverage comes from experience gained over the past nearly forty years in; Computer Manufacturers at Apricot and Tandon, Systems Software Publishers like Digital Research and Novell, Financial Systems with Pegasus Software, startups like Joost, Platform Architecture and Engineering with BBC Online and STEM Publishing and Technology Enterprise Architecture for Elsevier.

This web site shows some of our ongoing projects, publications and documentation. In doing this the aim is to show the breadth of coverage of technology today. Scientific Research Publishing is a Flipboard magazine on the business of Scientific Research Publishing with ~ 1,500 subscribers. The Walled Garden and Prairie is a Chatauqua on philosophic tensions in Architecture, Security and Infrastructure. The Wiki is a place for us to publish documentation, articles in an open and public way.

Projects

Sevilla

Seville Project

Capability Mapping Tool.

Using a curated taxonomy the tool enables the mapping of an Organisation to its capabilities, both internal and external. Progress includes the cleaning up of SIC and NAICS Codes.

Common diagrammatic vocabulary of Architecture

RepoSearch PrePrint Repo Search and Discovery

Publications

Scientific Research Publishing

Wide variation in UK universities' REF open access compliance

New data reveal widespread variation between UK universities’ readiness for the research excellence framework’s open access requirements.

Under the …

Wide variation in UK universities' REF open access compliance

Sustainability of Article Publishing Charge to Further Open Access

[Prathima Appaji, Creative Commons USA, Link (CC-BY)] In the field of academic publishing, there are a variety of models. Many journals use …

Six principles for assessing scientists for hiring, promotion, and tenure

The negative consequences of relying too heavily on metrics to assess research quality are well known, potentially fostering practices harmful to …

What if you choose to ignore IF (impact factor)?

Subhash Chandra Lakhotia, a former Professor of Zoology, is currently an INSA Senior Scientist and Distinguished Professor at Banaras Hindu …

What if you choose to ignore IF (impact factor)?

Insights

Introduction

‘Of course, you always liked books’, said a relative when he heard I was taking on a role at the British Library. Now, I have always …

Open Science in Indonesia

*Terima kasih to Afrilya, Surya Dalimunthe, Sami Kandha Dipura, and Dasapta Erwin Irawan from the Open Science Team Indonesia for their valuble input …

Looking beyond open access publishing

Vimal SimhaBengaluru May 22, (Research Matters):Sweden's recent decision to terminate a contract with a scientific publisher, because the publisher …

Looking beyond open access publishing

Open and Transparent Science Powered By Blockchain

Our Team

Meet who are behind the scenes

Founders

Manuel Martin

Manuel's career has been focused on supporting large collaborations through technological …

The Walled Garden & Prairie

walledgarden
&
prairie

This Chatauqua is a series on what is generally seen as two major incompatible views on how to design, implement and operate information systems. The aim is to explore how both general approaches make sense in different ways and in combination rather than in antagonism. In choosing Walled Garden and Prairie to categorise these two general approaches it might seem that I have a negative bias to one or the other, that really isn't where this is coming from. Walled Garden is often treated as a derogatory term, with hard perimeters, control and command hierarchies and long term planning but its also (in its original use) about providing the right conditions for different plants with different requirements and a structured navigation which allows maintenance, gardening without obstructing or affecting the rest of the Garden. In its way the Prairie might seem without controls, borders, vulnerable and inefficient but it's also resilient because there is no single point of failure and burning it down periodically keeps it healthy.

Interfaces All the Way Down

Page edited by Simon Lucy

A common habit when coding a new service, or class object with methods is to postpone actually writing the code that does the thing by adding another layer. The initial impetus to this is Good, abstract the detail from the caller, use the interface to be able to flex what the code does and so on. Often though it becomes a kind of procrastination; oh this layer isn't quite right, i need ton orchestrate some dependency and filter what happens depending upon some logic that really doesn't belong at this level. And one of the ways out of that conundrum is to make another layer (or in some languages have a friend class).

Walls and Trust Nothing

Page edited by Simon Lucy



At its simplest level the most obvious feature of the Walled Garden is the Wall. Walls have all sorts of connotations for us, especially at this moment in history, and they begin with protection by exclusion. Being surrounded by a wall gives us a feeling of security and control over who or what can gain access inside and control as to what and who leaves. In its original use for a real garden the wall was much more about protection, shelter and providing the right environment for the plants and crops. The surrounding wall warms in the sun and raises the ambient temperature for plants in the north of the garden, it provides shelter from prevailing and strong winds and it organises the land separating it for management.


Do the walls around our information systems share similar nurturing properties?

Yes. You can make a case that treating the Wall as an interface, or inversely interfaces as walls lets the code within those walls consume and provide services and information specific to those interfaces (or walls). So this takes the metaphor of the Wall to mean more than just an infrastructure and Enterprise defining concept. We'll get onto that soon.


Hard Perimiter with DMZ

But is this feeling of security really justified or is it actually a security blanket, merely comforting? If the wall is so impenetrable that nothing external can enter, even if invited by internal requests is that security, is that safer? Yes its safer, its how the world was prior to public internet and external services, each organisation, each installation of an organisation a locked down impenetrable lump.

Well, except when it wasn't, when the area outside the wall had its own perimeter space, a DMZ, a demilitarised zone with a public interface (probably to some packet switching network). In the DMZ files came in, and files were dropped for external systems to pick up, probably using FTP. This was fine when there was little interconnectivity between organisations but even before the public internet there was a need and there was a whole set of protocols like EDI to exchange data (somewhat arthritically). And when the public internet and the Web did arrive the same model was reused, with Web servers sitting in the DMZ outside the trusted area. This is when trust started to leak into the hard perimeter. The web servers needed content and data and had to send any transactions into the internal systems,

Sure copies or abstracts of databases could be staged out in the DMZ but data still had to flow at some point and the means of security remained the same, once across the firewall most everything was addressable and if access was controlled it was by account and password. On the whole once in a network space every device was potentially accessible. The combination of web browser, web services and thin or no interfaces between service and data led to minor and magnificently catastrophic exploits. And still does,

Considerable effort and work was put into network architectures to try and separate applications, data and control, and that works at an infrastructure level, the virtual networks can be kept separate, but there is still the problem of identity and authorisation. Relying on IP and routing rules whether in firewalls, routers, switches or servers does not scale and that same problem is easily repeated in soft networking using cloud architectures.



G30 on Data Articles

They Are Asking For What?

Page edited by Simon Lucy

In the preparation for GDPR and the UK refresh of the Data Protection Act it seems that some organisations, especially government departments and public bodies are asking some very awkward questions.

  • Assurances, pledges, loyalty oaths that an organisation is 'GDPR' ready.
  • Copies of the organisation's policies and procedures relating to GDPR
  • Claiming a right of audit 
  • Contractually requiring agreement and proof of deletion, 'forgetting' and, or returning of data.
  • Contractual requirements which duplicate or place the burden of handling personal data provided to the organisation, by the contracting party.


There are probably others going the rounds as well. Prefacing what I'm going to say with IANAL (I am not a lawyer) it makes sense to apply some rationality to this and maintain the proper borders between organisations. In general I do not believe that anyone has to promise to any one else, including the Government that they are going to obey the law. Extra contractual conditions won't provide any kind of indemnity to either party, remember that organisations that supply data have a duty of care to both the owners of the data (individuals), and whoever they share it with, that they indeed have the necessary permissions. There could potentially be a lot of 'indemnities' swapped around. But of course they mean nothing.

The same really holds true on being asked for copies of policies, procedures or even implementations. This isn't new behaviour there are some companies that seem to love forcing their own procedures down suppliers or even customer's throats. Being asked for the details of ISO 27001, the procedures for generating, holding and rotating keys and so on. Each one of these requests can be managed with a blanket 'We maintain and regularly review our policies in this and other areas in the light of Best practice and Regulatory conditions applicable at the time, this includes any legal requirements. The specifics of our policies, procedures and implementations are naturally commercially sensitive and are not to be shared.".

This is especially true over a right of audit. Unless the specifics of handling data can be entirely separated by supplier or third party it would be very difficult to allow third party audit without also exposing other data to which they had no reasonable access.

The last two points, contractually requiring or appearing to place the burden of performing the mandated deletion or forgetting of personal data, are going to be awkward for both supplier, third party and the owner of the data themselves. No organisation is going to want to have the unmitigated risk of a third party not being able to effectively delete or forget the data they've had, but this in no way absolves the data collecting company from their own obligations. There are practicalities around the deletion and forgetting of data which can mitigate some of the risk but first think about the shared risk of a collecting company and third parties that they may use to process, transform or manage that data for them. Those third parties are not just companies like Cambridge Analytica (who never needed identified information anyway), but any kind of service or processing organisation, lawyers, accountants, consultancies, outsourcing IT, off shoring, marketeers, job services, SaaS applications, on and on. All of them will have a shared risk.

Unlike Operational Risks, Business Risks should be singletons. If there isn't one policy, procedure, implementation in place, but many and those are determined by contract; there will be failure. The decision tree for making this sane is fairly obvious but there's a couple of things that might not be.

  • This is a Trust process so be transparent about what can be transparent, and transparent about where the borders are.
  • Ask the same Transparency of your Supplier, Customer etc 
  • Eliminate the possibility of being able to use aggregated data before exploring needing to share Personal Data.
  • Still before sharing Personal Data explore thoroughly pseudonymous data (really)
  • Before giving access to Personal Data explore the options of Escrow, the only copy of the data is managed possibly by one third party whose business is providing this service and has no interest in the data itself. (Just like any other Due Diligence process between tow or more parties).
  • If you do have to share data do not (that should be in CAPS), DO NOT ship data but control access. This can even work for the 3rd party processor if they have to take the data set. But they don't do they?
  • Deletion, Forgetting and Archive. Agree what they mean before you start doing this at all.

On the last point, which I want to elaborate on in lots of detail one day soon, it's important to not get bound up in definitions of Deletion, Forgetting and Archive that require adjusting the laws of Physics. The regulation so far, describes all of these states in terms of availability to the 'System' and access or use. Those are the states to concentrate upon. There are complications relating to Archive and legacy systems of many years standing but there are also ways of handling them.


CC BY-SA 4.0

Do We Need This?

Page edited by Simon Lucy

The GDPR forces us to look at our data, categorise it as personal, personally identifiable and everything else (keeping in mind that what was once impersonal can become personally identifiable in association), but often we don't  question why we collect and store this information. It is already part of existing Data Protection legislation that only data that is necessary should be collected and then kept only for as long as it is necessary, Rarely do we consider whether data or metadata is useful in itself once we add it into our model and data stores. Often we start collecting it for some future use which is neither clear, decided or planned; and once we have it we keep it because its data and must be valuable.

I'm suggesting that we not collect common personal categorisation data unless there is an overriding need and that for the overwhelming cases there is no such need. This thought was provoked most recently by this Tweet.

Ada Rose Cannon ada@mastodon.social Retweeted ProPublica

As a developer if you are ever asked to do something like this. Pause and look at yourself and what you are enabling. "Someone else would do it so I might as well be paid for it" is not an excuse. Don't build evil. Don't enable evil systems. We need a tech hippocratic oath.

Ada Rose Cannon ada@mastodon.social added,

My initial response was:

Simon_Lucy Retweeted Ada Rose Cannon ada@mastodon.social

This sounds easy to avoid but there are simpler, basic enablers. Collecting and indexing classifiers such as ethnicity, gender, gender preference allows populations to be targetted for whatever purpose.

Because it is straightforward to not engage in writing systems and applications that can be used to aid prejudice and foster division but its very hard to avoid modelling and designing into data stores categorisations that can be used in ways which are prejudicial to the owners of that personal data.

But what about needing to know who is affected by this or that prejudice and persecution so we can protect them and improve their lot? Surely we need to collect information to identify those parts of the population that need help? But do you need to count in order to know what is the right way to treat everyone? Is counting and identifying itself the wrong?

For specific needs does someone really need to identify themselves as disabled, or are they really an individual with a requirement?

Personally, I don't categorise myself in any ethnic, religious category on any form and would avoid gender and age if I could. As a modeller, architect, would I argue against collecting this data? I would now. 

I would employ all the arguments about not collecting personnel data unnecessarily. Does your application/system require gender to be relevant? Really? Age? Should the provision of public services need ethnic data? And so on.

Really ask if each one of these metadata categories is necessary, bear in mind that each of the categories will likely be from a control list plus 'other' perhaps. What purpose will be served, if its for some broad population statistical use ask how does this category give meaning ful information that actually matters in a statistic. Take Male/Female (I won't fall back on self described gender issues to begin with the traditional simple case should suffice), how does it help knowing someone ticked either box? Will they buy or be interested in a different product or service? Will they want different information, will the content be filtered?

If the answer is yes I'd ask the question, so you wouldn't sell to someone of the wrong gender? Would you only show pink bikes to girls? Carbon fibre drop handlebars to boys? I'd hope not, so how does your case differ?

Are the services, products and information that I'm interested in in any way absolutely connected with my age, gender, ethnicity, mobility? I don't think so. The actual services, products and information might very well have particular characteristics that I want to filter and search by but not necessarily because I possess or share those characteristics.

If you're building recommenders why limit or filter or weight your recommendations based upon any of these largely loose non-authoritative categories? Isn't the behaviour and content far more important. Quite a while ago now, over ten years, we were involved in building a streaming personal video platform and the Advertising people wanted us to include something like 200+ questions on the user's likes and dislikes. That got rejected fairly quickly just in contemplating the registration funnel but it did betray the then assumptions about how to collect and slice and dice population data. Analyse the individual, get as much stuff about the individual from the individual and apply that to the content, product or service. It came straight out of the publishing industry with the cards for subscribers to punch or circle their characteristics.

Now of course we apply it the other way round, the behaviour of the individual and their peers or group, their history of success and failure or abandonment, the content/product/service chosen by that group along with many other dimensions of behaviour to new and evolving content, products and services, all that Big Data stuff. And we don't really need all that categorisation up front, in fact it could skew the results badly. 

But for those organisations whose data sets were modelled and collected well before this Big Data magic and they had all this carefully researched (or not) data, has it been reevaluated? Or is it sitting there in the rest of the data but consciously or unconsciously splitting your data sets into what might be irrelevant and even misleading subsets.

I think a great many of these characteristics should not be collected and stored and they should all be reevaluated periodically. I include official forms in this, actually I especially include official forms in this.

I'd be interested in any comments or counter arguments people have.

CC BY-SA 4.0

GDPR What it means & What needs to be done

Page edited by Simon Lucy

An Executive Summary of GDPR and the steps needed to be taken by any Organisation to comply with the legislation and be able to answer to audit enquiries.

CC BY-SA 4.0


Contact Us